cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.29k forks source link

Weirdness in HFShowerLibrary handling of run 2 file #40218

Closed Dr15Jones closed 1 year ago

Dr15Jones commented 1 year ago

While doing some refactoring of the HFShowerLibrary code, I came across this handling of the hadronic particles

https://github.com/cms-sw/cmssw/blob/e0f38a318f32aa6f92ec665531961ec643f70c2a/SimG4CMS/Calo/src/HFShowerLibrary.cc#L362

this is used with the file

SimG4CMS/Calo/data/HFShowerLibrary_npmt_noatt_eta4_16en_v4.root

which has separate emParticles and hadParticles branches therefore there is no reason to offset the start value for the hadParticles. I believe this was due to a cut-n-paste of the previous handling of hadParticles seen in the lines above

https://github.com/cms-sw/cmssw/blob/e0f38a318f32aa6f92ec665531961ec643f70c2a/SimG4CMS/Calo/src/HFShowerLibrary.cc#L356-L357

In addition, the configuration used when reading that file https://github.com/cms-sw/cmssw/blob/d8a49b979b5ed82b5bb79296713fb485ce9b2344/Geometry/HcalSimData/python/HFParameters_cff.py#L29

causes the totEvents to be the hardcoded value of 5000*16=8000. https://github.com/cms-sw/cmssw/blob/e0f38a318f32aa6f92ec665531961ec643f70c2a/SimG4CMS/Calo/src/HFShowerLibrary.cc#L426-L428

The problem with that is the file contains 159881 entries in each of the TBranches which means for large record numbers the hadronic call could ask for data beyond the end of the TBranch.

Looking at the file contents, I see that the TBranch hadParticles has empty elements up until entry 79882. The emParticles is opposite, where it is always empty after 79880. So the offset is only needed because the file was written in a weird way.

So it looks like there is lots of wasted space in the file and still the possibility of reading off the end.

Dr15Jones commented 1 year ago

Assign simulation

cmsbuild commented 1 year ago

New categories assigned: simulation

@mdhildreth,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 1 year ago

A new Issue was created by @Dr15Jones Chris Jones.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

civanch commented 1 year ago

@abdoulline , can you, please, comment.

abdoulline commented 1 year ago

@civanch @Dr15Jones Thanks for bringing this issue to our (HCAL) attention. I take the liberty to include @bsunanda in the thread. And will alert the actual HFShowerLibrary code maintainer/developer Lev Kheyn @kheyn .

bsunanda commented 1 year ago

I believe the earlier version had a fixed # of events (5000) for each momentum bin. Now it does not have. In this case I would prefer if the exact # of events for each momentum bin for EM and Hadron parts be there and used. This will avoid going beyongthe limit and control the size of the file.

abdoulline commented 1 year ago

Sorry, if I'll add even more confusion...

If I remember it right, the aforementioned Run2-specific file HFShowerLibrary_npmt_noatt_eta4_16env4.root has been produced 6 years ago by David Lange by reformatting the previous ("original/initial")
HFShowerLibrary
npmt_noatt_eta4_16en_v3.root
(generated by Lev Kheyn and which has been removed back then from data/SimG4CMS/Calo/data ) to get better CPU performance: https://github.com/cms-sw/cmssw/pull/16049 (and the code which reads it was modified back then as well). But it should have been some purely technical re-arrangement (not affecting N_entries).

Edited: so, at the end this PR may not be relevant for the discussed issue...

Dr15Jones commented 1 year ago

So there are three files in /cvmfs/cms.cern.ch/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_0_pre5/external/el8_amd64_gcc10/data/SimG4CMS/Calo/data/

-rw-r--r-- 1 cvmfs cvmfs  343223032 Nov 28  2021 HFShowerLibrary_npmt_noatt_eta4_16en_v4.root
-rw-r--r-- 1 cvmfs cvmfs  190776418 Nov 28  2021 HFShowerLibrary_oldpmt_noatt_eta4_16en_v3.root
-rw-r--r-- 1 cvmfs cvmfs 1249815228 Nov 28  2021 HFShowerLibrary_run3_v6.root

It seems they correspond to Run 1 (v3), Run2 (v4) and Run3 (v6). The number of entries in each file's TTree are

v3 : 160,000 entries (hadParticles is empty up until entry 80,000 but emParticles has non-empty entries even beyond 80,000
v4 : 159,881 entries (hadParticles is empty up until entry 79,882 but emParticles has non-empty entries even beyond that 79,882
v6 : 159,999 entries (both hadParticles and emParticles have non-empty entries starting at 0)
kheyn commented 1 year ago

Halves of branches in RUN2 library are empty, namely, the second half of Em branch and the first half of Had branch (for Had that looks like offset). That happens because, in a usual mode, the whole tree is read. That is avoided in RUN3 library, where Em and Had branches are read separately.

Because of emptiness of one half of branches, the sizes of branches in RUN2 library should be 16x2x5000=160000. While looking at library in TBrowser I see 159881 entries for each branch. But when I check content of library for each energy, I see exactly 5000 showers for every energy.

The reasons of that contradiction are under study now.

kheyn commented 1 year ago

There is really deficit of 119 entries in RUN2 library.

I reformatted the library. Now it has exactly 8000 entries in each brunch, no offset, and improved performance.

abdoulline commented 1 year ago

@kheyn you mean 80,000 entries in each separate (in new version) branch.

Probably it's worth adding that all missing (119) entries are for the energy point e=7 GeV. And that when the actual Run2 library is used, starting from 7 GeV

So that when @kheyn compares average HF response (photoelectrons per GeV) both for electrons and pions (separately in Long as Short fibers) in two cases: (1) newly generated (aforementioned) library (labeled as "modified" in plots)
(2) HFShowerLibrary_npmt_noatt_eta4_16en_v4.root (labelled as "actual" in plots) Then the overall (mean) difference is rather small ( sub-percent for most of the points, except the last two).

ELECTRONS: Nphot_toe_deflib-fillbranch_ls_e16.pdf

PIONS: Nphot_toe_deflib-fillbranch_ls_pi16.pdf

abdoulline commented 1 year ago

Hi @Dr15Jones @civanch

I guess that there are the following ways/option to proceed:

(0) do nothing = preserving reproducibility of Run2 HF results, while there is already (as small) Run2 MC reproducibility break due to the fix https://github.com/cms-sw/cmssw/pull/39967 (which changes GEANT history)

(1) apply an "ugly patch" [1] to exclude missing 119 entries from 7 GeV, so that all other "subsequent" energy points would use their appropriate entries (5000) to get fully correct results with the actual library (v4), requires only minimal code update

(2) replace actual v4 by new v5 (+ "fileVersion=1") with recovered 119 entries for point e=7 GeV and containing full separate branches for electrons and pions - generated by @kheyn, as he has reported earlier today,
requires: (i) a request for adding new library to CMSSW SDT repository, (ii) small code (HFShoweLibrary.cc) + config changes:
https://github.com/cms-sw/cmssw/compare/master...abdoulline:cmssw:HFShowerLibrary_fix_Run2
NB: I've quickly tried 2018 TTbar MC (runTheMatrix.py -l 10824.0) with this option and it runs OK (at least) on 10 ev.

(3) replace Run2 v4 in question by the actual (improved) Run3 v6, means to backport Run3 library usage to Run2 era, (some ~2-5% difference may be expected in global HF variables), requires small config change


[1] Ugly patch: replacing the following fragment: https://github.com/cms-sw/cmssw/blob/master/SimG4CMS/Calo/src/HFShowerLibrary.cc#L593-#L600

    if (j == nMomBin_ - 2) {
      irc[1] = int(evtPerBin_ * 0.5 * r);
    } else {
      irc[1] = int(evtPerBin_ * r);
    }
    irc[1] += (j + 1) * evtPerBin_ + 1;
    r = G4UniformRand();
    irc[0] = int(evtPerBin_ * r) + 1 + j * evtPerBin_;

With:

     static const int SHIFT_PATCH = 119; //  in .h

    for (int k = 0; k < 2; ++k) {
       double r = G4UniformRand();
       int jk = j + k;
       int ir = jk * evtPerBin_ + 1;
       if (fileVersion_ >= 2) {
          ir += int(evtPerBin_ * r);
       } else {
          if (jk < 3) {
             ir += int(evtPerBin_ * r);
          } elseif (jk == 3) {
             ir += int((evtPerBin_ - SHIFT_PATCH) * r); 
          } else { 
             ir += int(evtPerBin_ * r) - SHIFT_PATCH;
          }
      }
      irc[k] = ir;
    }
kheyn commented 1 year ago

The deficit in 119 entries, happening at 7 GeV, results in shift of entries in branches at higher energies. That shift mostly influences the last energy in branch. That is, while requesting entry for electron or hadron of 1 TeV, call to missing entry could happen. No crash appears in that case, the result is returned zero number of photons. I attach distribution of number of photons in long fibers for electron of 1 TeV. nphot_distr2_long_el.pdf

civanch commented 1 year ago

@Dr15Jones , @abdoulline , @kheyn , it is good that the situation is understood. What to do next? For me there are only 2 optimal options :

1) "do nothing", because huge number of MC events are already produced 2) make modification proposed by Salavat as (1) "exclude 119 entries"- it would be fine if it would be possible to apply this code by flag, we will introduce a modifier, which will enable the flag by customisation

In contrary, if it is possible further improve the library/code for the Run3, we should do the best. In 2023 I would expect change of simulation, so a new library file and updated software would be fine.

abdoulline commented 1 year ago

@civanch I'm not sure I understand your : In contrary, if it is possible further improve the library/code for the Run3

Run3 HFShowerLibary file (as was presented in July-September 2021 in Simulation meetings) has

civanch commented 1 year ago

@abdoulline , I may be was imprecise. I mean that if any new improvement will be proposed for Run3 we can integrate without a problem. Concerning Run2 whatever we do we change SIM history. In that case, we will be asked to provide a modifier for the legacy release. Such modifier will not be needed for current master, so if we run Run2 simulation in the current master or later we will use improved library or code.

abdoulline commented 1 year ago

@civanch
I've put together a branch with the option (1) "ugly patch" https://github.com/cms-sw/cmssw/compare/master...abdoulline:cmssw:HF_patch_Run2

so that the patch can be applied by process.HFLibraryFileBlock.ApplyLibFix = True

I've tried it with (default) False and True in Run2 TTbar wf (10 ev)

and has got slightly different results.

civanch commented 1 year ago

@abdoulline , this is a safe approach, let us do this PR. I would guess it should be backported to 10_6_X, please, check if codes of modified classes were changed from 10_6 to 13_0. I do not think we need any other backport.

abdoulline commented 1 year ago

@civanch I'm personally not a big fan of this "ugly patch"...
@bsunanda is in favor of full-fledge fix, if I understand him correctly https://github.com/cms-sw/cmssw/issues/40218#issuecomment-1334894972 Not sure our framework colleagues are big fans of this kind of "hacking"...

And about 10_6_X - why so much back?
There's a lot of changes since then in all the HFShowerLibrary-related code/configs, including "unification/sync." of parameters with FastSim...

abdoulline commented 1 year ago

@civanch (cc @Dr15Jones) may I suggest that this issue and its possible fixing options https://github.com/cms-sw/cmssw/issues/40218#issuecomment-1350742388 be discussed not just in between you, Vladimir, and me, but in a broader circle, like Software release meeting or Simulation development forum? To have an insight from the framework colleagues, for instance?

civanch commented 1 year ago

Let us discuss at Friday meeting 15:30 CET. There will be limited number of peoples, it is equivalent of chat between us. We may put this as an item at the end or at the beginning of the meeting or discuss within News. For me the fixed new library file for Run2 would be also a good solution.

10_6_X is the Run2 legacy MC production release, whatever is needed for Run2 analysis should be backported there.

abdoulline commented 1 year ago

OK, let me put together several slides for Friday, summarizing what's listed/shown in the posts here.
Could you, please, add an item to the agenda after the News? "Run2 HF shower library issue"

abdoulline commented 1 year ago

Undertaken action: https://github.com/cms-sw/cmssw/pull/40357

civanch commented 1 year ago

@abdoulline , may be it is needed to backport https://github.com/cms-sw/cmssw/pull/40357 to 10_6_X

abdoulline commented 1 year ago

@civanch there were different configurations (yet separately for FullSim and FastSim) back then, no any Run3 library, previous library unpacking/reading code was quite different. And I suppose that all what we've implemented (with Lev and Sunanda) last year in 12_X specifically for Run3 and (on top of that) recently by Chris Jones in his PR https://github.com/cms-sw/cmssw/pull/40278 is hardly compatible with the old release.
So, it's not about backporting per se, but rather about re-implementing (+ reverse adapting) of new Run3 stuff back to 10_6_X. And I'm afraid I'm not available for considering it in the near future...

For 10_6_X the least-intrusive (= minimal and transparent) option would be the aforementioned (1) "ugly patch" (adapted to 10_6_X FastSim and FullSim configs and code) https://github.com/cms-sw/cmssw/issues/40218#issuecomment-1352722374

civanch commented 1 year ago

@abdoulline , Thanks for detailed answer. it seems backport is possible but heavy task. Let us stop at this point.

civanch commented 1 year ago

+1

cmsbuild commented 1 year ago

This issue is fully signed and ready to be closed.

civanch commented 1 year ago

Now Lev provide a new file for Run3 HFShowerLibrary_run3_v7.root, request to add it to CMSDIST: https://github.com/cms-sw/cmsdist/issues/8288

makortel commented 1 year ago

@cmsbuild, please close