Closed valsdav closed 1 day ago
assign ml
cms-bot internal usage
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46655/42576
New categories assigned: ml
@valsdav,@y19y19 you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Pull Request was created by @valsdav for master.
It involves the following packages:
@cmsbuild, @jfernan2, @mandrenguyen, @valsdav, @y19y19 can you please review it and eventually sign? Thanks. @Prasant1993, @Sam-Harper, @a-kapoor, @afiqaize, @jainshilpi, @lgray, @missirol, @ram1123, @rovere, @sameasy, @sobhatta, @varuns23 this is something you requested to watch as well. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here
please test
-1
Failed Tests: UnitTests RelVals RelVals-INPUT AddOn
Size: This PR adds an extra 44KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fe9c2/42719/summary.html
COMMIT: 90d7ded04bc3a8de4099feaadf777a77b80876ec
CMSSW: CMSSW_14_2_X_2024-11-11-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46655/42719/install.sh
to create a dev area with all the needed externals and cmssw changes.
I found 4 errors in the following unit tests:
---> test test_MC_22_crosscheck had ERRORS ---> test test_MC_23_crosscheck had ERRORS ---> test test_MC_23_setup had ERRORS and more ...
139.001A fatal system signal has occurred: segmentation violation
2022.000001A fatal system signal has occurred: segmentation violation
2022.002001A fatal system signal has occurred: segmentation violation
138.4138.4_PromptCollisions2021/step2_PromptCollisions2021.log
138.5138.5_ExpressCollisions2021/step2_ExpressCollisions2021.log
2024.0020012024.002001_RunJetMET02024B_10k/step1_dasquery.log
2024.002001
2024.003001
2024.101001
2024.103001
2024.300001
2024.301001
A fatal system signal has occurred: segmentation violation
A fatal system signal has occurred: segmentation violation
A fatal system signal has occurred: segmentation violation
please test
please abort test
please test
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46655/42587
Pull request #46655 was updated. @jfernan2, @mandrenguyen, @valsdav, @y19y19 can you please check and sign again.
-1
Failed Tests: Build ClangBuild
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fe9c2/42736/summary.html
COMMIT: 83d1d980bc4d87c684d97c733c98984907aec08a
CMSSW: CMSSW_14_2_X_2024-11-11-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46655/42736/install.sh
to create a dev area with all the needed externals and cmssw changes.
I found compilation warning when building: See details on the summary page.
I found compilation error while trying to compile with clang. Command used:
USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'
>> Entering Package RecoEgamma/EgammaPhotonProducers >> Entering Package RecoEgamma/EgammaTools >> Entering Package RecoEgamma/ElectronIdentification >> Entering Package RecoEgamma/PhotonIdentification >> Compile sequence completed for CMSSW CMSSW_14_2_X_2024-11-11-1100 gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1 + eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_2_X_2024-11-11-1100/tmp/el8_amd64_gcc12/cache/log/src '||' 'true)' ++ scram build outputlog >> Entering Package RecoEgamma/EgammaElectronProducers Entering library rule at src/RecoEgamma/EgammaElectronProducers/plugins >> Compiling edm plugin src/RecoEgamma/EgammaElectronProducers/plugins/ElectronNHitSeedProducer.cc
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46655/42591
Pull request #46655 was updated. @cmsbuild, @jfernan2, @mandrenguyen, @valsdav, @y19y19 can you please check and sign again.
please test
+1
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fe9c2/42744/summary.html
COMMIT: 22246a6567688228476da6fa57a6dd9cb62255f7
CMSSW: CMSSW_14_2_X_2024-11-11-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46655/42744/install.sh
to create a dev area with all the needed externals and cmssw changes.
Summary:
Hi @makortel, I would like to reproduce the memory profile of #46040 to compare the memory usage. Would you have any script to share to reproduce the same igProf memory profile that's in the issue? Thanks!
Below is an untested recipe to reproduce what I did for https://github.com/cms-sw/cmssw/issues/46494
# Set up developer area
cmssw-el7
export SCRAM_ARCH=slc7_amd64_gcc12
cmsrel CMSSW_14_0_16
cd CMSSW_14_0_16/src
cmsenv
# Set up the configuration
cp /eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1/job/WMTaskSpace/cmsRun1/PSet.p* .
xrdcp root://eoscms.cern.ch//eos/cms/tier0/store/data/Run2024H/Muon1/RAW/v1/000/386/319/00000/27604955-1005-455b-90eb-8195e2f02cb8.root .
cat >>PSet.py <<EOF
process.source.fileNames = ["file:27604955-1005-455b-90eb-8195e2f02cb8.root"]
process.maxEvents.input = 20
process.options.numberOfThreads = 4
process.options.numberOfStreams = 4
process.IgProfService = cms.Service('IgProfService',
reportFirstEvent = cms.untracked.int32(0),
reportEventInterval = cms.untracked.int32(10),
reportToFileAtPostEvent = cms.untracked.string('| gzip -c > 4th_4st_07.mem.%I.gz')
)
EOF
# Run IgProf memory profiler
igprof -d -t cmsRunGlibC -z -mp -o test_07.mem.gz cmsRunGlibC PSet.py > out.txt 2>&1
igprof-analyse --sqlite -d -v -g -r MEM_LIVE test_4th_4st_07.mem.20.gz | sqlite3 test_4th_4st_07.20_live.sql3
# Make the memory profile web browseable (needs a web area with cgi-bin access)
cp (which igprof-navigator) <web dir>/cgi-bin
mkdir <web dir>/cgi-bin/data
cp test_4th_4st_07.20_live.sql3 <web dir>/cgi-bin/data
Open web browser to <web dir>/cgi-bin/igprof-navigator/test_4th_4st_07.20_live
.
Thanks a lot! Unfortunately the data file is no more available, can I just use another one?
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46655/42713
Pull request #46655 was updated. @cmsbuild, @jfernan2, @mandrenguyen, @valsdav, @y19y19 can you please check and sign again.
Below is an untested recipe to reproduce what I did for #46494
# Set up developer area cmssw-el7 export SCRAM_ARCH=slc7_amd64_gcc12 cmsrel CMSSW_14_0_16 cd CMSSW_14_0_16/src cmsenv # Set up the configuration cp /eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1/job/WMTaskSpace/cmsRun1/PSet.p* . xrdcp root://eoscms.cern.ch//eos/cms/tier0/store/data/Run2024H/Muon1/RAW/v1/000/386/319/00000/27604955-1005-455b-90eb-8195e2f02cb8.root . cat >>PSet.py <<EOF process.source.fileNames = ["file:27604955-1005-455b-90eb-8195e2f02cb8.root"] process.maxEvents.input = 20 process.options.numberOfThreads = 4 process.options.numberOfStreams = 4 process.IgProfService = cms.Service('IgProfService', reportFirstEvent = cms.untracked.int32(0), reportEventInterval = cms.untracked.int32(10), reportToFileAtPostEvent = cms.untracked.string('| gzip -c > 4th_4st_07.mem.%I.gz') ) EOF # Run IgProf memory profiler igprof -d -t cmsRunGlibC -z -mp -o test_07.mem.gz cmsRunGlibC PSet.py > out.txt 2>&1 igprof-analyse --sqlite -d -v -g -r MEM_LIVE test_4th_4st_07.mem.20.gz | sqlite3 test_4th_4st_07.20_live.sql3 # Make the memory profile web browseable (needs a web area with cgi-bin access) cp (which igprof-navigator) <web dir>/cgi-bin mkdir <web dir>/cgi-bin/data cp test_4th_4st_07.20_live.sql3 <web dir>/cgi-bin/data
Open web browser to
<web dir>/cgi-bin/igprof-navigator/test_4th_4st_07.20_live
.
Hi @makortel, I managed to replicate the recipe. There was a crash at the end of the processing but it seems that the igprof report has been filled. Unfortunately my cgi-bin setup has some problem (it was working in the past). Would you mind copying the file in your folder please? https://dvalsecc.web.cern.ch/cgi-bin/data/test_4th_4st_07.20_live.sql3
please test
+1
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fe9c2/42966/summary.html
COMMIT: 2a58ae2f7976db3ca790e20cc381687d99933216
CMSSW: CMSSW_14_2_X_2024-11-19-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46655/42966/install.sh
to create a dev area with all the needed externals and cmssw changes.
Summary:
Milestone for this pull request has been moved to CMSSW_15_0_X. Please open a backport if it should also go in to CMSSW_14_2_X.
+1
+ml
Technical PR, moved TF sessions to global cache
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @mandrenguyen, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)
@makortel we probably want to backport this to the release used in T0 processing at the moment, right?
Would you mind copying the file in your folder please? https://dvalsecc.web.cern.ch/cgi-bin/data/test_4th_4st_07.20_live.sql3
I can, but I'd need a path to the file on lxplus that I could access (access through https
results in the cgi-bin
script being called that fails with internal server error, and I don't have permissions to access the file through EOS)
we probably want to backport this to the release used in T0 processing at the moment, right?
Based on https://github.com/cms-sw/cmssw/issues/46494 the expected memory savings would be about 7 MB for an 8-stream job. It isn't much compared to a full reconstruction job, but it is something nevertheless. My take is that backports would not be essential, but if done, I'd do backport also to 14_0_X to potentially benefit the MC production.
+1
PR description:
Solves #46494. Related to #46040
EgammaDNNHelper
was storing TF graphs globally by job, but TF sessions were owned my theGSFElectronProducer
andGEDPhotonProducers
by stream. This PR moves the TFSessions in theEgammaDNNHelper
, making them global by job.PR validation:
Purely technical PR, no changes expected.