Closed Dr15Jones closed 1 year ago
Assign simulation
New categories assigned: simulation
@mdhildreth,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @Dr15Jones Chris Jones.
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
@abdoulline , can you, please, comment.
@civanch @Dr15Jones Thanks for bringing this issue to our (HCAL) attention. I take the liberty to include @bsunanda in the thread. And will alert the actual HFShowerLibrary code maintainer/developer Lev Kheyn @kheyn .
I believe the earlier version had a fixed # of events (5000) for each momentum bin. Now it does not have. In this case I would prefer if the exact # of events for each momentum bin for EM and Hadron parts be there and used. This will avoid going beyongthe limit and control the size of the file.
Sorry, if I'll add even more confusion...
If I remember it right, the aforementioned Run2-specific file HFShowerLibrary_npmt_noatt_eta4_16env4.root
has been produced 6 years ago by David Lange by reformatting the previous ("original/initial")
HFShowerLibrarynpmt_noatt_eta4_16en_v3.root
(generated by Lev Kheyn and which has been removed back then from data/SimG4CMS/Calo/data )
to get better CPU performance:
https://github.com/cms-sw/cmssw/pull/16049
(and the code which reads it was modified back then as well).
But it should have been some purely technical re-arrangement (not affecting N_entries).
Edited: so, at the end this PR may not be relevant for the discussed issue...
So there are three files in
/cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_0_pre5/external/el8_amd64_gcc10/data/SimG4CMS/Calo/data/
-rw-r--r-- 1 cvmfs cvmfs 343223032 Nov 28 2021 HFShowerLibrary_npmt_noatt_eta4_16en_v4.root
-rw-r--r-- 1 cvmfs cvmfs 190776418 Nov 28 2021 HFShowerLibrary_oldpmt_noatt_eta4_16en_v3.root
-rw-r--r-- 1 cvmfs cvmfs 1249815228 Nov 28 2021 HFShowerLibrary_run3_v6.root
It seems they correspond to Run 1 (v3), Run2 (v4) and Run3 (v6). The number of entries in each file's TTree are
v3 : 160,000 entries (hadParticles is empty up until entry 80,000 but emParticles has non-empty entries even beyond 80,000
v4 : 159,881 entries (hadParticles is empty up until entry 79,882 but emParticles has non-empty entries even beyond that 79,882
v6 : 159,999 entries (both hadParticles and emParticles have non-empty entries starting at 0)
Halves of branches in RUN2 library are empty, namely, the second half of Em branch and the first half of Had branch (for Had that looks like offset). That happens because, in a usual mode, the whole tree is read. That is avoided in RUN3 library, where Em and Had branches are read separately.
Because of emptiness of one half of branches, the sizes of branches in RUN2 library should be 16x2x5000=160000. While looking at library in TBrowser I see 159881 entries for each branch. But when I check content of library for each energy, I see exactly 5000 showers for every energy.
The reasons of that contradiction are under study now.
There is really deficit of 119 entries in RUN2 library.
I reformatted the library. Now it has exactly 8000 entries in each brunch, no offset, and improved performance.
@kheyn you mean 80,000 entries in each separate (in new version) branch.
Probably it's worth adding that all missing (119) entries are for the energy point e=7 GeV. And that when the actual Run2 library is used, starting from 7 GeV
So that when @kheyn compares average HF response (photoelectrons per GeV) both for electrons and pions (separately in Long as Short fibers) in two cases:
(1) newly generated (aforementioned) library (labeled as "modified" in plots)
(2) HFShowerLibrary_npmt_noatt_eta4_16en_v4.root (labelled as "actual" in plots)
Then the overall (mean) difference is rather small ( sub-percent for most of the points, except the last two).
ELECTRONS: Nphot_toe_deflib-fillbranch_ls_e16.pdf
Hi @Dr15Jones @civanch
I guess that there are the following ways/option to proceed:
(0) do nothing = preserving reproducibility of Run2 HF results, while there is already (as small) Run2 MC reproducibility break due to the fix https://github.com/cms-sw/cmssw/pull/39967 (which changes GEANT history)
(1) apply an "ugly patch" [1] to exclude missing 119 entries from 7 GeV, so that all other "subsequent" energy points would use their appropriate entries (5000) to get fully correct results with the actual library (v4), requires only minimal code update
(2) replace actual v4 by new v5 (+ "fileVersion=1") with recovered 119 entries for point e=7 GeV
and containing full separate branches for electrons and pions - generated by @kheyn, as he has reported earlier today,
requires:
(i) a request for adding new library to CMSSW SDT repository,
(ii) small code (HFShoweLibrary.cc) + config changes:
https://github.com/cms-sw/cmssw/compare/master...abdoulline:cmssw:HFShowerLibrary_fix_Run2
NB: I've quickly tried 2018 TTbar MC (runTheMatrix.py -l 10824.0) with this option and it runs OK (at least) on 10 ev.
(3) replace Run2 v4 in question by the actual (improved) Run3 v6, means to backport Run3 library usage to Run2 era, (some ~2-5% difference may be expected in global HF variables), requires small config change
[1] Ugly patch: replacing the following fragment: https://github.com/cms-sw/cmssw/blob/master/SimG4CMS/Calo/src/HFShowerLibrary.cc#L593-#L600
if (j == nMomBin_ - 2) {
irc[1] = int(evtPerBin_ * 0.5 * r);
} else {
irc[1] = int(evtPerBin_ * r);
}
irc[1] += (j + 1) * evtPerBin_ + 1;
r = G4UniformRand();
irc[0] = int(evtPerBin_ * r) + 1 + j * evtPerBin_;
With:
static const int SHIFT_PATCH = 119; // in .h
for (int k = 0; k < 2; ++k) {
double r = G4UniformRand();
int jk = j + k;
int ir = jk * evtPerBin_ + 1;
if (fileVersion_ >= 2) {
ir += int(evtPerBin_ * r);
} else {
if (jk < 3) {
ir += int(evtPerBin_ * r);
} elseif (jk == 3) {
ir += int((evtPerBin_ - SHIFT_PATCH) * r);
} else {
ir += int(evtPerBin_ * r) - SHIFT_PATCH;
}
}
irc[k] = ir;
}
The deficit in 119 entries, happening at 7 GeV, results in shift of entries in branches at higher energies. That shift mostly influences the last energy in branch. That is, while requesting entry for electron or hadron of 1 TeV, call to missing entry could happen. No crash appears in that case, the result is returned zero number of photons. I attach distribution of number of photons in long fibers for electron of 1 TeV. nphot_distr2_long_el.pdf
@Dr15Jones , @abdoulline , @kheyn , it is good that the situation is understood. What to do next? For me there are only 2 optimal options :
1) "do nothing", because huge number of MC events are already produced 2) make modification proposed by Salavat as (1) "exclude 119 entries"- it would be fine if it would be possible to apply this code by flag, we will introduce a modifier, which will enable the flag by customisation
In contrary, if it is possible further improve the library/code for the Run3, we should do the best. In 2023 I would expect change of simulation, so a new library file and updated software would be fine.
@civanch I'm not sure I understand your : In contrary, if it is possible further improve the library/code for the Run3
Run3 HFShowerLibary file (as was presented in July-September 2021 in Simulation meetings) has
very good CPU performance, on a par with previous versions;
I believe Chris has just polished HFShowerLibrary (unpacking/reading) code per se.
About your second preferred (optimal?) option: to avoid any misunderstanding - it does not keep HF response intact (anyway).
@abdoulline , I may be was imprecise. I mean that if any new improvement will be proposed for Run3 we can integrate without a problem. Concerning Run2 whatever we do we change SIM history. In that case, we will be asked to provide a modifier for the legacy release. Such modifier will not be needed for current master, so if we run Run2 simulation in the current master or later we will use improved library or code.
@civanch
I've put together a branch with the option (1) "ugly patch"
https://github.com/cms-sw/cmssw/compare/master...abdoulline:cmssw:HF_patch_Run2
so that the patch can be applied by process.HFLibraryFileBlock.ApplyLibFix = True
I've tried it with (default) False and True in Run2 TTbar wf (10 ev)
and has got slightly different results.
@abdoulline , this is a safe approach, let us do this PR. I would guess it should be backported to 10_6_X, please, check if codes of modified classes were changed from 10_6 to 13_0. I do not think we need any other backport.
@civanch I'm personally not a big fan of this "ugly patch"...
@bsunanda is in favor of full-fledge fix, if I understand him correctly
https://github.com/cms-sw/cmssw/issues/40218#issuecomment-1334894972
Not sure our framework colleagues are big fans of this kind of "hacking"...
And about 10_6_X - why so much back?
There's a lot of changes since then in all the HFShowerLibrary-related code/configs, including "unification/sync." of parameters with FastSim...
@civanch (cc @Dr15Jones) may I suggest that this issue and its possible fixing options https://github.com/cms-sw/cmssw/issues/40218#issuecomment-1350742388 be discussed not just in between you, Vladimir, and me, but in a broader circle, like Software release meeting or Simulation development forum? To have an insight from the framework colleagues, for instance?
Let us discuss at Friday meeting 15:30 CET. There will be limited number of peoples, it is equivalent of chat between us. We may put this as an item at the end or at the beginning of the meeting or discuss within News. For me the fixed new library file for Run2 would be also a good solution.
10_6_X is the Run2 legacy MC production release, whatever is needed for Run2 analysis should be backported there.
OK, let me put together several slides for Friday, summarizing what's listed/shown in the posts here.
Could you, please, add an item to the agenda after the News?
"Run2 HF shower library issue"
Undertaken action: https://github.com/cms-sw/cmssw/pull/40357
@abdoulline , may be it is needed to backport https://github.com/cms-sw/cmssw/pull/40357 to 10_6_X
@civanch there were different configurations (yet separately for FullSim and FastSim) back then, no any Run3 library,
previous library unpacking/reading code was quite different. And I suppose that all what we've implemented (with Lev and Sunanda) last year in 12_X specifically for Run3 and (on top of that) recently by Chris Jones in his PR https://github.com/cms-sw/cmssw/pull/40278 is hardly compatible with the old release.
So, it's not about backporting per se, but rather about re-implementing (+ reverse adapting) of new Run3 stuff back to 10_6_X. And I'm afraid I'm not available for considering it in the near future...
For 10_6_X the least-intrusive (= minimal and transparent) option would be the aforementioned (1) "ugly patch" (adapted to 10_6_X FastSim and FullSim configs and code) https://github.com/cms-sw/cmssw/issues/40218#issuecomment-1352722374
@abdoulline , Thanks for detailed answer. it seems backport is possible but heavy task. Let us stop at this point.
+1
This issue is fully signed and ready to be closed.
Now Lev provide a new file for Run3 HFShowerLibrary_run3_v7.root, request to add it to CMSDIST: https://github.com/cms-sw/cmsdist/issues/8288
@cmsbuild, please close
While doing some refactoring of the HFShowerLibrary code, I came across this handling of the hadronic particles
https://github.com/cms-sw/cmssw/blob/e0f38a318f32aa6f92ec665531961ec643f70c2a/SimG4CMS/Calo/src/HFShowerLibrary.cc#L362
this is used with the file
SimG4CMS/Calo/data/HFShowerLibrary_npmt_noatt_eta4_16en_v4.root
which has separate
emParticles
andhadParticles
branches therefore there is no reason to offset the start value for thehadParticles
. I believe this was due to a cut-n-paste of the previous handling ofhadParticles
seen in the lines abovehttps://github.com/cms-sw/cmssw/blob/e0f38a318f32aa6f92ec665531961ec643f70c2a/SimG4CMS/Calo/src/HFShowerLibrary.cc#L356-L357
In addition, the configuration used when reading that file https://github.com/cms-sw/cmssw/blob/d8a49b979b5ed82b5bb79296713fb485ce9b2344/Geometry/HcalSimData/python/HFParameters_cff.py#L29
causes the
totEvents
to be the hardcoded value of5000*16=8000
. https://github.com/cms-sw/cmssw/blob/e0f38a318f32aa6f92ec665531961ec643f70c2a/SimG4CMS/Calo/src/HFShowerLibrary.cc#L426-L428The problem with that is the file contains
159881
entries in each of the TBranches which means for large record numbers the hadronic call could ask for data beyond the end of the TBranch.Looking at the file contents, I see that the TBranch
hadParticles
has empty elements up until entry 79882. TheemParticles
is opposite, where it is always empty after 79880. So the offset is only needed because the file was written in a weird way.So it looks like there is lots of wasted space in the file and still the possibility of reading off the end.