cms-sw / cmsdist

CMS Offline Software build configuration
Other
27 stars 180 forks source link

[TF] Update TF v2.16.1 (without libfft) #9388

Closed iarspider closed 4 days ago

iarspider commented 1 week ago

Alternative version of #9241

cmsbuild commented 1 week ago

A new Pull Request was created by @iarspider for branch IB/CMSSW_14_2_X/tf.

@aandvalenzuela, @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this. cms-bot commands are listed here

cmsbuild commented 1 week ago

cms-bot internal usage

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

cmsbuild commented 1 week ago

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41248/summary.html COMMIT: f38a19438c4731f350ad1df7594d3afec91ddfac CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41248/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41248/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41248/git-merge-result

Comparison Summary

Summary:

cmsbuild commented 1 week ago

-1

Failed Tests: UnitTests Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41261/summary.html COMMIT: f38a19438c4731f350ad1df7594d3afec91ddfac CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_aarch64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41261/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41261/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41261/git-merge-result

Unit Tests

I found 1 errors in the following unit tests:

---> test TestGeneratorInterfacePythia8ConcurrentGeneratorFilter had ERRORS
cmsbuild commented 1 week ago

Pull request #9388 was updated.

cmsbuild commented 1 week ago

Pull request #9388 was updated.

iarspider commented 1 week ago

test TestGeneratorInterfacePythia8ConcurrentGeneratorFilter had ERRORS

Generators' web service acting up, not caused by this PR.

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

cmsbuild commented 1 week ago

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41266/summary.html COMMIT: 84be8329fe6357df8c1ecf3e7bae311cc7c25a82 CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41266/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

File "/pool/condor/dir_852039/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/py3-keras/3.5.0-2311ec30788f593a06ec5c6a7cad014d/lib/python3.9/site-packages/keras/src/activations/activations.py", line 1, in 
from keras.src import backend
File "/pool/condor/dir_852039/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/py3-keras/3.5.0-2311ec30788f593a06ec5c6a7cad014d/lib/python3.9/site-packages/keras/src/backend/__init__.py", line 47, in 
raise ValueError(f"Unable to import backend : {backend()}")
ValueError: Unable to import backend : theano
error: Bad exit status from /pool/condor/dir_852039/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.GVH9wF (%build)


RPM build errors:
line 42: It's not recommended to have unversioned Obsoletes: Obsoletes: external+tfaot-model-test-simple+1.0.1-5417541d3ca11e7ad7e8b0347e6ab9f9
Macro expanded in comment on line 393: %{aot_config}, pointing to the aot config file of the model to compile (required)

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12 No idea why it failed to build - works fine locally

cmsbuild commented 1 week ago

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41270/summary.html COMMIT: 84be8329fe6357df8c1ecf3e7bae311cc7c25a82 CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41270/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

File "/pool/condor/dir_2818171/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/py3-keras/3.5.0-2311ec30788f593a06ec5c6a7cad014d/lib/python3.9/site-packages/keras/src/activations/activations.py", line 1, in 
from keras.src import backend
File "/pool/condor/dir_2818171/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/py3-keras/3.5.0-2311ec30788f593a06ec5c6a7cad014d/lib/python3.9/site-packages/keras/src/backend/__init__.py", line 47, in 
raise ValueError(f"Unable to import backend : {backend()}")
ValueError: Unable to import backend : theano
error: Bad exit status from /pool/condor/dir_2818171/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.Mixruy (%build)


RPM build errors:
line 42: It's not recommended to have unversioned Obsoletes: Obsoletes: external+tfaot-model-test-multi+1.0.1-81ccb1815e041543b851768c760fd616
Macro expanded in comment on line 393: %{aot_config}, pointing to the aot config file of the model to compile (required)

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

cmsbuild commented 1 week ago

Pull request #9388 was updated.

cmsbuild commented 1 week ago

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41276/summary.html COMMIT: 8df8a959f85d050f6c9802d1085c550bdcc55ddf CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41276/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

cmsbuild commented 1 week ago

Pull request #9388 was updated.

cmsbuild commented 1 week ago

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41277/summary.html COMMIT: 0509280715e282ea38e20b5894937e2fabb06d13 CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41277/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

cmsbuild commented 1 week ago

Pull request #9388 was updated.

iarspider commented 1 week ago

@smuzaffar the backend is overwritten by file ~/.keras/keras.json. It seems to be present on all cmsbuild machines, with timestamp around the date when the node was created. This can be overridden with environment variable at import time

smuzaffar commented 1 week ago

@smuzaffar the backend is overwritten by file ~/.keras/keras.json. It seems to be present on all cmsbuild machines, with timestamp around the date when the node was created. This can be overridden with environment variable at import time

@iarspider , can you please check how was this created? I don't remember we deployed it via puppet

iarspider commented 1 week ago

@smuzaffar the backend is overwritten by file ~/.keras/keras.json. It seems to be present on all cmsbuild machines, with timestamp around the date when the node was created. This can be overridden with environment variable at import time

@iarspider , can you please check how was this created? I don't remember we deployed it via puppet

It is created here: https://github.com/keras-team/keras/blob/v3.5.0/keras/src/backend/config.py#L221-L241. _KERAS_DIR defaults to ~/.keras, but can be overwritten by setting $KERAS_HOME.

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

cmsbuild commented 1 week ago

Pull request #9388 was updated.

iarspider commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

cmsbuild commented 1 week ago

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41285/summary.html COMMIT: 1ba91fcf806f9fd4068e54ff36832bc0d0624482 CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41285/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41285/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41285/git-merge-result

Comparison Summary

Summary:

smuzaffar commented 1 week ago

enable gpu

smuzaffar commented 1 week ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

cmsbuild commented 1 week ago

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41304/summary.html COMMIT: 1ba91fcf806f9fd4068e54ff36832bc0d0624482 CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_aarch64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41304/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41304/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41304/git-merge-result

cmsbuild commented 1 week ago

-1

Failed Tests: GpuUnitTests Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41308/summary.html COMMIT: 1ba91fcf806f9fd4068e54ff36832bc0d0624482 CMSSW: CMSSW_14_2_TF_X_2024-09-02-2300/el8_amd64_gcc12 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41308/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41308/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41308/git-merge-result

GPU Unit Tests

I found 9 errors in the following unit tests:

---> test testTFMetaGraphLoadingCUDA had ERRORS
---> test testTFGraphLoadingCUDA had ERRORS
---> test testTFConstSessionCUDA had ERRORS
and more ...

Comparison Summary

Summary:

iarspider commented 1 week ago

@smuzaffar since the tests depend on Tensorflow (not Keras), the environment was not set. Should I add an explicit dependency on keras to PhysicsTools/TensorFlow? I don't think we can handle circular dependency keras ↔ tensorflow

smuzaffar commented 6 days ago

@iarspider , can you try running unit tests locally after setting the KERAS_BACKEND=tensorflow env? Note that ## INITENV SET KERAS_BACKEND tensorflow only add/set this env via init.*sh file. For scram one need to update the xml file to set it. So if setting KERAS_BACKEND=tensorflow allows to fix the gou unit tests then please add KERAS_BACKEND=tensorflow in to one of tf xml files

cmsbuild commented 6 days ago

Pull request #9388 was updated.

iarspider commented 6 days ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

iarspider commented 6 days ago

please abort

cmsbuild commented 6 days ago

Pull request #9388 was updated.

iarspider commented 6 days ago

@cmsbuild please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

cmsbuild commented 6 days ago

-1

Failed Tests: UnitTests GpuUnitTests Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41405/summary.html COMMIT: 34311da36d5769e4cc2373c8549ce6b225f0cbfb CMSSW: CMSSW_14_2_TF_X_2024-09-06-2300/el8_amd64_gcc12 Additional Tests: GPU User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9388/41405/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41405/git-recent-commits.json https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-eef346/41405/git-merge-result

Unit Tests

I found 1 errors in the following unit tests:

---> test testSiStripPayloadInspector had ERRORS

GPU Unit Tests

I found 9 errors in the following unit tests:

---> test testBrokenLineFitGPU_t had ERRORS
---> test testFitsGPU_t had ERRORS
---> test testTFGraphLoadingCUDA had ERRORS
and more ...

Comparison Summary

Summary:

iarspider commented 5 days ago

ChatGPT suggests using Session::ListDevices to check if GPU is available, instead of mutable_device_count():

#include "tensorflow/core/public/session.h"
#include "tensorflow/core/protobuf/config.pb.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/common_runtime/device_factory.h"

#include <iostream>

int main() {
    // Initialize a session
    tensorflow::Session* session;
    tensorflow::SessionOptions options;

    // Try to create a new session
    tensorflow::Status status = tensorflow::NewSession(options, &session);
    if (!status.ok()) {
        std::cerr << "Error creating session: " << status.ToString() << std::endl;
        return -1;
    }

    // Retrieve the list of available devices
    std::vector<tensorflow::DeviceAttributes> devices;
    status = session->ListDevices(&devices);
    if (!status.ok()) {
        std::cerr << "Error listing devices: " << status.ToString() << std::endl;
        return -1;
    }

    // Check if any GPU devices are available
    bool gpu_available = false;
    for (const auto& device : devices) {
        std::cout << "Device name: " << device.name() << ", type: " << device.device_type() << std::endl;
        if (device.device_type() == "GPU") {
            gpu_available = true;
        }
    }

    if (gpu_available) {
        std::cout << "GPU is available and can be used." << std::endl;
    } else {
        std::cout << "No GPU devices are available." << std::endl;
    }

    // Clean up
    session->Close();
    delete session;

    return 0;
}