Time to Draw plots using the maps

BrieucF commented 8 years ago

I open this issue to discuss and share our experience about the map usage.

I am surprised by the time needed to analyze the output of the framework using the maps : for instance, it takes 1min40s to plot the first lepton pt in ElEl category on a TTTo2L2Nu file with 60 k events root /storage/data/cms/store/user/obondu/TTTo2L2Nu_13TeV-powheg/TTTo2L2Nu_13TeV-powheg_MiniAODv2/151026_105643/0000/output_mc_12.root t->Draw("hh_leptons[hh_llmetjj[hh_map_llmetjj_id_iso_btagWP_pair[674]].ilep1].p4.Pt()","elel_category && hh_ll[hh_llmetjj[hh_map_llmetjj_id_iso_btagWP_pair[674]].ill].isElEl")

I launched an instance of histFactory with 156 plots using condor facilities and giving only one rootfile per job and most of them took ~12 hours. The job for our signal (240 k events) is still running, two days later...

Can you please share your experience with the map usage? This would help me to understand if it is specific to our way of using them. If it is related to the map usage itself, maybe we could rethink this part.

swertz commented 8 years ago

Hi, I needed about 6 hrs for jobs running on single DY files (~ 16 000 events each) to produce 2800 histograms. Could this be related to the fact that you had about ~10x more entries in your plots than we do?

blinkseb commented 8 years ago

Hi,

I spent some time profiling the code of histFactory, and, unfortunately, most of the time is spent evaluating the TTreeFormula, and not actually reading the tree. The only solution here is to reduce the complexity of the formula (I'm not sure, but it seems like nesting formula are really time consuming), or wait for the ROOT team to implement TTreeFormula using cling (not for tomorrow...)

BrieucF commented 8 years ago

Thanks for the comments. Indeed, the big difference with the previous set of plots is that we now need to go through several collections in order to access the indices of the objects. We can discuss that together but to me, O(1 day) to make the plots seems to be a big price to pay for flexibility given the fact a full ntuple production is of the same order of magnitude.

OlivierBondu commented 8 years ago

One alternative would be to store the dileptons / dijet / etc. specific info directly in the llmetjj candidate... this would avoid the largest part of the nested calls ?

OlivierBondu commented 8 years ago

@BrieucF : can this be closed ?

BrieucF commented 8 years ago

Yes!

cp3-llbb / Framework

Time to Draw plots using the maps #89