eic / epic-analysis

General (SI)DIS analysis framework for the EIC
GNU Lesser General Public License v3.0
3 stars 9 forks source link

Converter for config files to support parallellization #235

Closed c-dilks closed 1 year ago

c-dilks commented 1 year ago

Is your feature request related to a problem? Please describe. With one config file, one can only read high statistics data serially. We need the ability to process data in parallel, and the easiest way is to process one upstream ROOT file per thread (or computing cluster node).

Describe the solution you'd like Write a script to take a config file and convert it into one config file per root file. The singular-file config files should be fully usable in epic-analysis, causing it to only process that file (in other words, it must have all the key-value pairs needed).

To handle Q2 weighting, we should include a feature in this script to count how many TOTAL events are in each Q2 range; this requires opening each TTree and counting the number of entries, which should be done serially since TChains may not be able to handle ~1000 trees. The total event count would then be written to the singular-file config files, and analysis-epic would then be updated to use that event count for the Q2 weighting.

Once we have one config file per upstream root file, we need to write support scripts for condor and slurm to run epic-analysis on each config file, one per node.