cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.3k forks source link

Event auxiliary access patterns #24780

Open dan131riley opened 6 years ago

dan131riley commented 6 years ago

EventAuxiliary is a small record with the EventID, ProcessHistoryID, and a few other items that is currently stored as a branch in the EventTree, written with the same flush interval as everything else in the event tree. When checking for duplicate events is enabled, the entire EventAuxiliary branch is read at the beginning of the job so that the presence or absence of duplicate events can be used in the decision to enable or disable fast cloning--if there are duplicates to be removed, then fast cloning has to be disabled. This results in many small reads scattered throughout the input files (all the input files when duplicate removal across files is enabled). On a storage system with a large read size, this can result in reads of a large fraction of the input files at job start.

Options for amelioration:

  1. Relax or disable the duplicate check for some job classes (duplicate removal is arguably not so useful for split jobs)
  2. Store the EventIDs in a MetaData branch written at the end of the file. Probably the most natural way to do this is by making IndexIntoFile::unsortedEventNumbers_ persistent.

In addition, even if duplicate event checking is disabled, the TTreeCache can still end up prefetching the bulk of the EventAuxiliary branch. I'll submit a PR for this shortly.

From an email thread with @Dr15Jones and @bbockelm

cmsbuild commented 6 years ago

A new Issue was created by @dan131riley Dan Riley.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Dr15Jones commented 6 years ago

We must consider the FWLite use case where physicists are looking at the data directly in ROOT. Moving the event identify information out of the Events Three would be very confusing for analysis.

dan131riley commented 6 years ago

I agree, which is why I switched from moving the entire EventAuxiliary branch (yesterday's proposal) to instead persisting IndexIntoFile::unsortedEventNumbers_ and (I forgot to highlight this) leaving the EventAuxiliary branch in the EventTree as is. This would mean some redundancy of event numbers or EventIDs (they should compress well).

Dr15Jones commented 6 years ago

How about we look at this in a different way. Could we come up with a memory efficient way of spotting duplicate events in the OutputModule? If seen then, we could just set a bool in the meta data which would warn that one can't do fast cloning.

Dr15Jones commented 6 years ago

assign core

cmsbuild commented 6 years ago

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

dan131riley commented 6 years ago

The default duplicateCheckMode is "checkAllFilesOpened", which checks for duplicates across all input files. Knowing if one file has no duplicates doesn't help with that. For a "no duplicates in file" flag to be useful, we'd have to be using one of the modes that checks each input file independently. (I am in favor of making that change for split jobs, since the all files check doesn't give reproducible results if the job splitting isn't deterministic.)

dan131riley commented 5 years ago

Getting back to this, I had an idea and did a quick proof-of-concept for a possible approach here:

https://github.com/cms-sw/cmssw/compare/master...dan131riley:clustering-EventAuxiliary2

There are 10 items in EventAuxiliary; 5 of them change very frequently, and 5 change very infrequently. In this approach I run-length encode the 5 that change infrequently and keep the 5 that change frequently every event, with some additional space optimization by putting the processGUID in a dictionary. This takes much less memory than the previous approach: for 1,000,000 empty events, where there's only instance of the items that change infrequently, the RSS expansion is 40 MB (vs. 270 MB for the previous version). On 100,000 MINIAOD data events with 8 lumi sections, RSS expansion was around 25 MB.

One problem with this approach is that it is somewhat fragile--if something is added to EventAuxiliary (unlikely), then RootOutputFile needs to be updated one way or another. This could perhaps be ameliorated somewhat by restructuring EventAuxiliary to separatethe slow and fast changing parts, so any updates would be entirely within EventAuxiliary.

So I'm looking for feedback whether this is worth finishing, or if I should be looking at alternatives. Obvious alternatives are to write the EventAuxiliary to a TMemFile and let ROOT do the compression, or do a file format revision that persists IndexIntoFile::unsortedEventNumbers_.

Dr15Jones commented 5 years ago

@dan131riley I'm in support of this idea. I'd even go so far as to consider modifying the file format to better accomodate some of the changes.