Open kkrizka opened 6 years ago
Yeah, I'm not sure how easily feasible this is. Not many people are going to be good enough to be able to do trigger bit decisions at the ntuple level, especially for those who are joining ATLAS now. In most analyses, I only see ~5-10 trigger decisions being stored. Storing 50 does seem like a lot... It's an interesting thought. If you already specify the list of triggers you want to store, is it possible to store a function that calculates the trigger bit given a series of trigger names, and then you can search for that?
Hi @kratsg, are you suggesting to add the output of this function in addition to what @kkrizka suggests? I feel like just adding this output would reduce the freedom of the user downstream to experiment with different trigger lists. This is especially true if common ntuples are produced in an analysis. If we decide for this combined approach, may I suggest to store a vector for each trigger, where the first element is the trigger bit and the second the prescale?
@fscutti so no. What this effectively amounts to is requiring a consistent way of mapping input triggers to a fixed vector of trigger strings so that you just store a vector of prescales per event knowing that the order of the vector is well-defined... similarly with trigger bits for passing. The question really is, how do we sort/predetermine that order in an entirely generic / configurable way that doesn't place undue burden on the end user?
An example is to provide a python script that parses the config.py/config.json someone uses, extracts the trigger, and provides the necessary order... but then keeping that up to date with the C++ code becomes somewhat hard to do.
The other option might be to use a friend tree -- where the friend tree has a single row listing the trigger stings, and if you want to get the trigger names into your trees, just add a friend tree to link things up (join).
You could use an std::unordered_map
instead of a vector. Then you would only need a single, general, map of trigger names to numeric id's.
You could even map from an enum class
, though this would require providing a (trivial) specialization of std::hash
to be C++11 compatible.
Edit isn't working.
enum class
is probably a bad idea, given the sheer number of triggers there are.
Hi all,
I was not proposing to have a single bit string for triggers. I was thinking of a different branch per trigger decision, similar to what was used in the Run 1 ntuples.
-- Karol Krizka
I am looking at reducing the size of my ntuples. I made some quick plots looking at the space different branches take (via
TBranch::GetTotalSize()
). I split the branches into categories based on the word before the first _. If the word is not jet, fatjet, muon, el or ph, then it is put into the event category.I put the composition of my data ntuples at the bottom. The event category takes up about 20% of the ntuples. Of that, over half is taken up by
triggerNames
(I run with #1184 applied, the branch isisPassedBitsNames
in master). Probably not too surprising, since each trigger is stored as a lengthy set of characters (up to 20 for the large-R jet triggers). If you have several triggers, things add up...Might be worth rethinking about how the trigger information is stored. My first thought is to have a boolean branch per trigger named
triggername
(or a floattriggername_prescale
). Similar to what the oldNTUP_COMMON
used. Might be faster, since one does not have to do a linear search through a list to determine a trigger decision. Not sure how nice this would be if the complete trigger list is not known at run time (ie: triggers added/removed for the different data periods).@kratsg @ntadej Thoughts? Maybe I am the only one who stores a lot of trigger decisions (~50)....