This commit adds the option to use TChains for combining the TTrees of the GC output files instead of hadding them.
Usage: Add --pseudo-hadd to the excalibur.py call, only working when using xrootd in se path instead of srm
The output is now only a few MB, as the source files form the job outputs (hopefully stored at nrg) are linked.
This gets rid of the hadd process and hence saves a lot of time after gc finishes. For me hadd with a target on ceph can take multiple hours for a single run Period (e.g. 2018 Run D > 200GB output). Pseudo hadd only takes a couple of seconds.
Using this compared to a local copy on the ceph mount also yields speed ups when reading the files, e.g. when using Lumberjack (25 cores).
For DY MC sample: 364.453 seconds (single file on ceph) to 159.888 seconds (309 files on nrg)
For 2018 D: 262.997 seconds (single file on ceph) to 189.587 seconds (380 files on nrg) and less average cpu usage, because of no overhead from the ceph mount
This commit adds the option to use TChains for combining the TTrees of the GC output files instead of hadding them.
Usage: Add
--pseudo-hadd
to theexcalibur.py
call, only working when usingxrootd
in se path instead of srmThe output is now only a few MB, as the source files form the job outputs (hopefully stored at nrg) are linked. This gets rid of the hadd process and hence saves a lot of time after gc finishes. For me hadd with a target on ceph can take multiple hours for a single run Period (e.g. 2018 Run D > 200GB output). Pseudo hadd only takes a couple of seconds.
Using this compared to a local copy on the ceph mount also yields speed ups when reading the files, e.g. when using Lumberjack (25 cores). For DY MC sample: 364.453 seconds (single file on ceph) to 159.888 seconds (309 files on nrg) For 2018 D: 262.997 seconds (single file on ceph) to 189.587 seconds (380 files on nrg) and less average cpu usage, because of no overhead from the ceph mount