cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.29k forks source link

High RSS memory increase for Full (Fast) Simulation in EL8 compared to SLC7 #42929

Open nhoerman opened 1 year ago

nhoerman commented 1 year ago

Testing Full Simulation SIM step (e.g.: MinBias, 500 events, 1 thread) on EL8 platform and SLC7 there seems to be a significant increase of the RSS memory consumption on EL8.

Comparing cmsRun to cmsRunGlibC and cmsRunTC: ) same RSS memory increase using cmsRunTC ) less RSS memory consumption using cmsRunGlibC

Used servers: olsky-05 (CS8 with singularity cmssw-el8) and olsky-06 (SLC7) FullSim_cmsRuns.pdf

cmsbuild commented 1 year ago

A new Issue was created by @nhoerman .

@antoniovilela, @Dr15Jones, @sextonkennedy, @makortel, @smuzaffar, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 1 year ago

Assign core, simulation

cmsbuild commented 1 year ago

New categories assigned: core,simulation

@Dr15Jones,@civanch,@makortel,@mdhildreth,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 1 year ago

Could you provide a recipe to reproduce the setup (e.g. cmsDriver command(s))?

I tested 10 events with step 1 of workflow 12434.0 (2023 TTbar) on cmsdev32 (slc7) directly and through cmssw-el8 container, and didn't see any significant difference (peak RSS in both cases being 1035 MB as reported by SimpleMemoryCheck service.

nhoerman commented 1 year ago

I use TimeMemoryInfo.py (GEN-SIM step): cmsDriver.py MinBias_14TeV_pythia8_TuneCP5_cfi --conditions auto:phase1_2022_realistic -n 10 --nThreads 1 --era Run3 --eventcontent FEVTDEBUG --relval 10000,100 -s GEN,SIM --customise=Validation/Performance/TimeMemoryInfo.py --pileup=NoPileUp --datatier GEN-SIM --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --geometry DB:Extended --dirout=./ --mc olsky-05/06: SLC7: Peak rss size 1004.9 Mbytes EL8: Peak rss size 1285.03 Mbytes

makortel commented 1 year ago

Thanks. I ran couple of tests on lxplus

So on a quick look the behavior seems to be related to the actual OS version of the node rather than of our Apptainer container.

makortel commented 1 year ago

I ran MALLOC_CONF=stats_print:true cmsRun <config> on both slc7 and el8 nodes. I didn't really learn much from the printout (except there are some differences), but in case anyone else would be able to understand them better, I'm attaching them here. slc7_jemalloc_stats.txt el8_jemalloc_stats.txt

makortel commented 1 year ago

For future reference, the CMSSW version used in the PDF attached in the description and in my tests was 13_3_0_pre3.

makortel commented 1 year ago

I did some more testing on random lxplus nodes on the peak RSS

container cc7 node el8 node el9 node
cmssw-cc7 1048.26 MB 987.289 MB 1278.88 MB
cmssw-el8 976.453 MB 990.133 MB 1287.62 MB
cmsse-el9 1052.91 MB 998.445 MB 1290.95 MB

Interestingly, and contrary to my previous test https://github.com/cms-sw/cmssw/issues/42929#issuecomment-1747293925, the EL8-node RSS was now compatible with CC7-node RSS (the node was coincidentally the same as in my previous test).

Perhaps there is a random element in play? (similar to effects discussed in https://github.com/cms-sw/cmssw/issues/42387)