Open acmbulla opened 6 months ago
cms-bot internal usage
A new Issue was created by @acmbulla.
@rappoccio, @Dr15Jones, @antoniovilela, @smuzaffar, @makortel, @sextonkennedy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign dqm
New categories assigned: dqm
@rvenditti,@syuvivida,@tjavaid,@nothingface0,@antoniovagnerini you have been requested to review this Pull request/Issue and eventually sign? Thanks
I've always used the framework in CMSSW_11_1_4 (for other reasons, I'm familiar with that env), but if you try to use root6.20 to open a root6.30 file you will encounter a segmentation violation error (case1).
This is (unfortunately) kind of expected, more details in https://github.com/cms-sw/cmssw/issues/43882. We backported ROOT's fix to open a file produced with ROOT 6.30 down to 12_4_X, but are not planning to go for earlier release cycles (because we are not aware of any production use case requiring that).
Could you post the compilation error of case 2?
type tracking
Hi,
There is no compilation error. The macro starts and simply is stuck at its first print..
./makeValidationPlots.sh 369978 DQM/DQM_V0001_R000369978ZeroBiasCMSSW_14_0_0_pre0-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1DQMIO.root pre0 DQM/DQM_V0001_R000369978ZeroBiasCMSSW_14_0_0_pre0_ROOT630-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1DQMIO.root pre0_root630 369978_ZeroBias_CMSSW_14_0_0_pre0_vs_14_0_0_pre0_root630 D 1. 1. Generating some index files before comparing the histos /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions Analyzing DQM/DQM_V0001_R000369978ZeroBiasCMSSW_14_0_0_pre0-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1DQMIO.root and DQM/DQM_V0001_R000369978ZeroBiasCMSSW_14_0_0_pre0_ROOT630-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1DQMIO.root in 369978_ZeroBias_CMSSW_14_0_0_pre0_vs_14_0_0_pre0_root630
Processing /afs/cern.ch/user/a/abulla/CMSSW_14_0_0/src/TkRelVal/collisions/runValidationComparison.C("DQM/DQM_V0001_R000369978ZeroBiasCMSSW_14_0_0_pre0-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1DQMIO.root","pre0","DQM/DQM_V0001_R000369978ZeroBias__CMSSW_14_0_0_pre0_ROOT630-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1__DQMIO.root","pre0_root630","/eos/project/c/cmsweb/www/tracking/validation/DATA/369978_ZeroBias_CMSSW_14_0_0_pre0_vs_14_0_0_pre0_root630","D",1.,"false")...
Info in
but I mean, for hours.. and when you gently kill the process (ctrl+c) you get
^CError in
I was able to simplify the reproducer to
# In ROOT 6.30 environment, e.g. CMSSW_14_0_0
$ git clone -b bug_report https://github.com/CMSTrackingPOG/TkRelVal.git
$ cd TkRelVal/collisions
$ root
root [0] .L ReleaseComparison.cpp+
# takes ages
The file in question is https://github.com/CMSTrackingPOG/TkRelVal/blob/bug_report/collisions/ReleaseComparison.cpp and has ~4k lines. Usual recommendation would be to split the code into more source files.
I found out a way to run the actual compilation command (such that intermediate files are kept)
root [1] gSystem->CompileMacro("ReleaseComparison.cpp", "cvd")
Repeating the long-taking compilation command with -O0
and -O1
are quite fast, with -O2
and the default -O3
taking a long time (I didn't wait).
I'd be tempted to conclude the source file is just too heavy to be compiled in a reasonable time with optimizations enabled. If the problem would be in e.g. inlining of the frequently called functions createTH1FPlot()
and createTProfPlot()
, maybe moving them to a separate source file, and compiling+loading it separately into ROOT would help to reduce the total compilation time?
I'll tag @pcanal in case he would have any ideas. With the simplified recipe maybe the ROOT forum would be useful as well.
assign pdvm
Given the relation to validation, pdmv is probably closer match than dqm (sorry for the noise)
unassign dqm
If the problem is indeed in the time gcc
takes to compile the source code then splitting it is the only options. If the issue had been in the dictionary generation steps (rootcling
being called by ACLiC
) then an option would have been to split the file in function forward declaration and class declarations and hide the implementation from Cling
(behind #ifndef __CLING__
).
assign pdmv
New categories assigned: pdmv
@AdrianoDee,@sunilUIET,@miquork you have been requested to review this Pull request/Issue and eventually sign? Thanks
@pcanal, is there an easy way to change the optimization level ACLiC uses?
In the past I've frobbed the compile options via
gSystem->GetMakeSharedLib()/gSystem->SetMakeSharedLib();
gSystem->GetMakeExe()/gSystem->SetMakeExe();
@acmbulla
what do you see with top
for the process(es) related to makeValidationPlots.sh
?
Is the CPU utilization high? (sometimes file reading may be slow .. or the search paths could explode and the system takes forever to get to related dependencies during compilation/processing)
is there an easy way to change the optimization level ACLiC uses?
You select the debug mode by passing 'g' when loading the scripts (either as the option to CompileMacro
or after the trailing +
.
You can also customize the value used (eg. use -O1
) with:
gSystem->SetFlagsDebug( ... ); // See GetFlagsDebug
gSystem->SetFlagsOpt( ... ); // See GetFlagsOpt
@slava77
well it starts with rootcling which takes no more that 50% of CPU but as soon as cc1plus chicks in, then it takes a stable 95-100% of CPU usage.. Does it help?
well it starts with rootcling which takes no more that 50% of CPU but as soon as cc1plus chicks in, then it takes a stable 95-100% of CPU usage.. Does it help?
I guess so; it looks like a good evidence in the direction of the file being too complex to compile.
I can suggest a simple check, to comment out the main function after the first few calls to createTH1FPlot
and see if it runs much faster. This would suggest if splitting of the script is going to be practical.
OTOH, based on Matti's check of different optimization flags, I'm guessing a switch to -O1
would be fine, assuming a compiled version was running in a reasonable time before (this less optimal version will be probably x2 slower if not less).
Yeah leaving just the firsts createTH1FPlot
and createTProfPlot
it takes ~30 seconds to start and then executes the code in the usual (which is not very efficient) speed. So i guess i can work on splitting the things..
Dear all,
Since we moved to CMSSW 14 I've been stuck in using my private script to do RelVal comparison. The script is old, essentially it is a simple macro which runs over defined paths in two rootFiles and saves the plots.
There is some Python wrapper to make things easier but you can just follow these steps to replicate the errors I'm facing. I've always used the framework in CMSSW_11_1_4 (for other reasons, I'm familiar with that env), but if you try to use root6.20 to open a root6.30 file you will encounter a segmentation violation error (case1). If you try to use CMSSW_14_0_0, to be consistent with the files you are using, you will be stuck and the macro will not compile (case2).
To convince you that this is a root-related problem, in CMSSW_11_1_4, you are free to also launch the comparison with some random CMSSW_13 files (case3).
Case 1 and 3: cmsrel CMSSW_11_1_4 cd src cmsenv git clone -b bug_report git@github.com:CMSTrackingPOG/TkRelVal.git cd TkRelVal ./debug_test.sh
after that, you should have the rootfiles into collisions/DQM. if not, please download [1,4] and put them into collisions/DQM
Case 2 (stuck) cmsrel CMSSW_14_0_0 cd src cmsenv git clone -b bug_report git@github.com:CMSTrackingPOG/TkRelVal.git cd TkRelVal ./debug_test.sh
after that, you should have the rootfiles into collisions/DQM. if not, please download [1,4] and put them into collisions/DQM
**** If you are not able to download the required file, please download: 1) https://cmsweb.cern.ch/dqm/relval/data/browse/ROOT/RelValData/CMSSW_13_3_x/DQM_V0001_R000369978__ZeroBias__CMSSW_13_3_0_pre1-130X_dataRun3_Prompt_frozen_v4_RelVal_2023D-v1__DQMIO.root 2) https://cmsweb.cern.ch/dqm/relval/data/browse/ROOT/RelValData/CMSSW_13_3_x/DQM_V0001_R000369978__ZeroBias__CMSSW_13_3_0_pre2-132X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1__DQMIO.root 3) https://cmsweb.cern.ch/dqm/relval/data/browse/ROOT/RelValData/CMSSW_14_0_x/DQM_V0001_R000369978__ZeroBias__CMSSW_14_0_0_pre0-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1__DQMIO.root 4) https://cmsweb.cern.ch/dqm/relval/data/browse/ROOT/RelValData/CMSSW_14_0_x/DQM_V0001_R000369978__ZeroBias__CMSSW_14_0_0_pre0_ROOT630-133X_dataRun3_Prompt_frozen_v1_RelVal_2023D-v1__DQMIO.root