Closed jfernan2 closed 3 months ago
A new Pull Request was created by @jfernan2 for branch master.
@aandvalenzuela, @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks. @antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this. cms-bot commands are listed here
cms-bot internal usage
Hi @jfernan2 Should you use 29834.21 instead of 29634.21 ? It is PU workflow.
Pull request #2282 was updated.
Correct @srimanob I have fixed it Thanks!
enable profiling
please test
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7a973e/40235/summary.html
COMMIT: 743195fd4c976150152f9bf126a2e5fd6cad6bdc
CMSSW: CMSSW_14_1_X_2024-07-04-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cms-bot/2282/40235/install.sh
to create a dev area with all the needed externals and cmssw changes.
Summary:
@jfernan2 , looks like 12634.21
is not a valid workflow (at least it is not in 14.1.X) or do we need to pass any extra option to runTheMatrix.py
to run it?
@jfernan2 , looks like
12634.21
is not a valid workflow (at least it is not in 14.1.X) or do we need to pass any extra option torunTheMatrix.py
to run it?
Hi @smuzaffar
It will need relvals_opt = --what upgrade
as the workflow has not defined in https://github.com/cms-sw/cmssw/blob/master/Configuration/PyReleaseValidation/python/relval_2017.py yet.
I create the following PR since the workflow should be defined in relval_2017, https://github.com/cms-sw/cmssw/pull/45381
So we need to provide a way to instruct bot how to run workflows which are not active by default for runTheMatrix. We can add a variable e.g PROFILING_OPTS="-w upgrade,standard"
in https://github.com/cms-sw/cms-bot/blob/master/cmssw-pr-test-config so that bot can use it when running runTheMatrix
So we need to provide a way to instruct bot how to run workflows which are not active by default for runTheMatrix. We can add a variable e.g
PROFILING_OPTS="-w upgrade,standard"
in https://github.com/cms-sw/cms-bot/blob/master/cmssw-pr-test-config so that bot can use it when running runTheMatrix
That would be useful so that we can handle upgrade workflow. Thanks @smuzaffar Since the workflows we need should be in relval_2017 anyways, so maybe we can merge https://github.com/cms-sw/cmssw/pull/45381 (after PR tests) then follows by this PR.
dear @srimanob and @smuzaffar Now that https://github.com/cms-sw/cmssw/pull/45381 has been merged, could we revive this PR? Thanks
please test
+externals looks good
This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)
+1
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7a973e/40434/summary.html
COMMIT: 743195fd4c976150152f9bf126a2e5fd6cad6bdc
CMSSW: CMSSW_14_1_X_2024-07-16-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cms-bot/2282/40434/install.sh
to create a dev area with all the needed externals and cmssw changes.
Summary:
@smuzaffar after the inclusion of this PR, I thought we could have profiling comparison in PR tests, but it looks like something else is missing, see for example last trial: https://github.com/cms-sw/cmssw/pull/45333 Profiling results for 12634.21 and 29834.21 are there but comparison is empty: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/summary.html I fail to see the reason why since the logs shows all OK, apparently: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/testsResults/profiling.txt Do you have any hint? Thanks
No, I was referring to the igprof text results comparison, these comparison files are empty: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/profiling/12634.21/RES_CPU_compare_12634.21.txt https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/profiling/29834.21/RES_CPU_compare_29834.21.txt Thanks
may be @gartung knows why profiling comparison is empty
may be @gartung knows why profiling comparison is empty
@gartung is on vacation until August 9.
It looks like a segfault in the reco step for 12634.21 https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/profiling/12634.21/step3_igprof_cpu.log
Segfault in the reco step for 29834.21 as well https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d1a85e/40514/profiling/29834.21/step3_igprof_cpu.log
Thanks @gartung I am trying a last time since it worked in https://github.com/cms-sw/cms-bot/issues/1912
@gartung the same two workflows 29834.21 and 12634.21 run fine with igprof in Jenkins profiling[1]
I believe the difference is on how igprof is called with -j JobReport
igprof -pp -d -t cmsRun -z -o ./igprofCPU_step3.gz -- cmsRun step3_igprof.py -j step3_igprof_cpu_JobReport.xml >& step3_igprof_cpu.log -> crashes
igprof -d -pp -z -o step3_igprofCPU.gz -t cmsRun cmsRun step3_igprof.py -> runs fine
Somehow it was removed for igprof here[2], since those xml files seem to not be used anywhere, hence I am proposing the following PR if you agree:
https://github.com/cms-cmpwg/profiling/pull/8
[1] https://cmssdt.cern.ch/jenkins/job/release-run-reco-profiling/533/console https://cmssdt.cern.ch/jenkins/job/release-run-reco-profiling/538/console [2] https://github.com/cms-sw/cms-bot/blob/5bac572863f680e3d69e22005e437665b5df666f/reco_profiling/profileRunner.py#L335
I believe the difference is on how igprof is called with -j JobReport
I'd find it very strange if the framework job report would be causing segfaults under IgProf, but who knows.
Me too @makortel but I have just repeated the test in Jenkins with success for both workflows using igprof and no JobReport output....
@smuzaffar igprof is still giving problems, however the baseline seems to not be running igprof for the two wfs in question (12634.21 and 29834.21), so we miss the reference anyway, see fopr example this recent trial: https://cmssdt.cern.ch/SDT/jenkins-artifacts/ib-baseline-tests/CMSSW_14_2_X_2024-09-01-2300/el8_amd64_gcc12/-GenuineIntel/matrix-results/
Any idea? Thanks
Changed to monitor wf 29834.21 (D110 upgrade) and 12634.21 (Run3 2023)