Closed sam-may closed 3 years ago
- For some reason,
awkward
does not automatically load this as a record so that we can do e.g.selectedPhotons.pt
. So, I added a function inphoton_selections.py
to do this manually. Related,selectedPhotons
seemingly needs to be loaded separately from each file -- otherwise,awkward
creates 2d arrays of other branches likegg_mass
. I don't entirely understand why this is needed, but it works.
can you give examples of the failure cases? this worries me a little, as how do we make sure we can always catch this small caveat when writing the code, one careless mistake would make the downstream behavior (bug) difficult to catch.
If I load the branches all together (i.e. don't load photons separately), I get
File "/home/users/smay/Hgg/HggAnalysisDev/Preselection/selections/selection_utils.py", line 15, in add_cuts n_events_cut = len(self.events[cut]) File "/home/users/smay/Hgg/HggAnalysisDev/env/lib64/python3.6/site-packages/awkward/highlevel.py", line 1005, in __getitem__ return ak._util.wrap(self._layout[where], self._behavior) MemoryError: std::bad_alloc
From debugging, I found that this was due to awkward
creating 2d arrays of gg_*
variables. If I load photons separately, this doesn't happen.
For your concern "how do we make sure we always catch this?", aside from the fact that the code crashes if we don't do things this way, this is implemented directly in photon_selections.py
and in the load_file
function. This is shared for all analyses (so far just ttH and HH->ggTauTau), so someone would have to explicitly change this to reintroduce the bug.
Given that Leonardo reproduced exact event counts with the new code, I would say this is "weird" but not concerning.
no, I was not worried about this particuar feature, I mean more about other small details like this related to columnar tools, maybe it's a matter of getting used to it. if something is crashing then it's great to catch problem, the harder one is that related to number doesn't make sense but one won't notice without x-check on related numbers
but anyway, just a small note/concern that doesn't affect the validity of this PR
Right, this is a good point. I think one solution for this would be to define a unit test, as you suggested a while ago.
I would think that once we sync the ttH yields a bit more closely, we could use the ttH Leptonic preselection as the unit test.
Includes relevant changes to run on v5 skims, plus some cleaning up of the code, including:
selectedPhotons
. For some reason,awkward
does not automatically load this as a record so that we can do e.g.selectedPhotons.pt
. So, I added a function inphoton_selections.py
to do this manually. Related,selectedPhotons
seemingly needs to be loaded separately from each file -- otherwise,awkward
creates 2d arrays of other branches likegg_mass
. I don't entirely understand why this is needed, but it works.diphoton_selections.py : diphoton_preselection
(since this should be the same for every H->gg analysis) and lepton/tau/jet variables are computed inside respective functionsanalysis_selections.py
(since this can differ between analyses)HHggTauTau_InclusivePresel
-- these should be relevant for rejecting tt+X and ttH in BDT training.The
HHggTauTau_InclusivePresel
can be run with the following command:/bin/nice -n 19 python loop.py --selections "HHggTauTau_InclusivePresel" --nCores 24 --debug 1 --options "data/HH_ggTauTau_default.json" --output_tag <your_output_tag>
If you run on just data and signal (by adding
--select_samples "Data,HH_ggTauTau"
), it should take less than 10 minutes:[LoopHelper] Total time to run 77 jobs on 24 cores: 7.85 minutes
And the ttH Leptonic preselection can be run with:
/bin/nice -n 19 python loop.py --selections "ttH_LeptonicPresel" --nCores 24 --debug 1 --options "data/ttH_Leptonic.json" --samples "data/samples_and_scale1fb_ttH.json" --output_tag <your_output_tag>