Open rahmans1 opened 4 days ago
Are these storage numbers normalized for the actual number of events per chunk? That might (must to some degree) be different between the campaigns.
Are these storage numbers normalized for the actual number of events per chunk? That might (must to some degree) be different between the campaigns.
Not normalized for events per chunk. But both of these files has ~500 events. The file from 24.11.1 has 531 and the file from 24.10.0 has 509 events. That shouldn't cause the branch sizes to differ so drastically i think.
Plot would be useful as benchmark with flagging (as in capybara) when there is a significant change.
Plot would be useful as benchmark with flagging (as in capybara) when there is a significant change.
Yep. I was thinking of adding it as a benchmark.
Not normalized for events per chunk. But both of these files has ~500 events. The file from 24.11.1 has 531 and the file from 24.10.0 has 509 events. That shouldn't cause the branch sizes to differ so drastically i think.
Magic 512 event boundary on bucket filling? If > 512, new bucket, which brings fixed overhead?
Not normalized for events per chunk. But both of these files has ~500 events. The file from 24.11.1 has 531 and the file from 24.10.0 has 509 events. That shouldn't cause the branch sizes to differ so drastically i think.
Magic 512 event boundary on bucket filling? If > 512, new bucket, which brings fixed overhead?
Intriguing. How would we confirm that?
Not normalized for events per chunk. But both of these files has ~500 events. The file from 24.11.1 has 531 and the file from 24.10.0 has 509 events. That shouldn't cause the branch sizes to differ so drastically i think.
Magic 512 event boundary on bucket filling? If > 512, new bucket, which brings fixed overhead?
Intriguing. How would we confirm that?
Wowsers.
24.10.0
root [1] .ls
TNetXNGFile** root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.10.0/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root
TNetXNGFile* root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.10.0/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root
KEY: TTree events;2 events data tree [current cycle]
KEY: TTree events;1 events data tree [backup cycle]
KEY: TTree podio_metadata;1 metadata tree for podio I/O functionality
24.11.1
root [1] .ls
TNetXNGFile** root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.11.1/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root
TNetXNGFile* root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.11.1/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root
KEY: TTree events;27 events data tree [current cycle]
KEY: TTree events;26 events data tree [backup cycle]
KEY: TTree podio_metadata;1 metadata tree for podio I/O functionality
-> increase from 2 to 27 cycles
24.10.0:
root [1] events->Print("_TrackerEndcapHits_MCParticle*")
******************************************************************************
*Tree :events : events data tree *
*Entries : 509 : Total = 240389363 bytes File Size = 67998112 *
* : : Tree compression factor = 3.51 *
******************************************************************************
*Br 0 :_TrackerEndcapHits_MCParticle : *
* | Int_t _TrackerEndcapHits_MCParticle_ *
*Entries : 509 : Total Size= 6813 bytes File Size = 3410 *
*Baskets : 2 : Basket Size= 32000 bytes Compression= 1.26 *
*............................................................................*
*Br 1 :_TrackerEndcapHits_MCParticle.index : *
* | Int_t index[_TrackerEndcapHits_MCParticle_] *
*Entries : 509 : Total Size= 52505 bytes File Size = 16493 *
*Baskets : 3 : Basket Size= 57344 bytes Compression= 3.14 *
*............................................................................*
*Br 2 :_TrackerEndcapHits_MCParticle.collectionID : *
* | UInt_t collectionID[_TrackerEndcapHits_MCParticle_] *
*Entries : 509 : Total Size= 52554 bytes File Size = 2980 *
*Baskets : 3 : Basket Size= 57344 bytes Compression= 17.38 *
*............................................................................*
24.11.1:
root [3] events->Print("_TrackerEndcapHits_MCParticle*")
******************************************************************************
*Tree :events : events data tree *
*Entries : 531 : Total = 5805017410 bytes File Size = 273198902 *
* : : Tree compression factor = 21.34 *
******************************************************************************
*Br 0 :_TrackerEndcapHits_MCParticle : *
* | Int_t _TrackerEndcapHits_MCParticle_ *
*Entries : 531 : Total Size= 11079 bytes File Size = 7026 *
*Baskets : 27 : Basket Size= 32000 bytes Compression= 1.01 *
*............................................................................*
*Br 1 :_TrackerEndcapHits_MCParticle.index : *
* | Int_t index[_TrackerEndcapHits_MCParticle_] *
*Entries : 531 : Total Size= 52969 bytes File Size = 26661 *
*Baskets : 27 : Basket Size= 9269 bytes Compression= 1.95 *
*............................................................................*
*Br 2 :_TrackerEndcapHits_MCParticle.collectionID : *
* | UInt_t collectionID[_TrackerEndcapHits_MCParticle_] *
*Entries : 531 : Total Size= 53186 bytes File Size = 6418 *
*Baskets : 27 : Basket Size= 9269 bytes Compression= 8.11 *
*............................................................................*
-> 2 or 3 baskets to 27.
Not normalized for events per chunk. But both of these files has ~500 events. The file from 24.11.1 has 531 and the file from 24.10.0 has 509 events. That shouldn't cause the branch sizes to differ so drastically i think.
Magic 512 event boundary on bucket filling? If > 512, new bucket, which brings fixed overhead?
Intriguing. How would we confirm that?
Wowsers.
Cycles
24.10.0
root [1] .ls TNetXNGFile** root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.10.0/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root TNetXNGFile* root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.10.0/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root KEY: TTree events;2 events data tree [current cycle] KEY: TTree events;1 events data tree [backup cycle] KEY: TTree podio_metadata;1 metadata tree for podio I/O functionality
24.11.1
root [1] .ls TNetXNGFile** root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.11.1/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root TNetXNGFile* root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.11.1/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root KEY: TTree events;27 events data tree [current cycle] KEY: TTree events;26 events data tree [backup cycle] KEY: TTree podio_metadata;1 metadata tree for podio I/O functionality
-> increase from 2 to 27 cycles
Baskets (not buckets)
24.10.0:
root [1] events->Print("_TrackerEndcapHits_MCParticle*") ****************************************************************************** *Tree :events : events data tree * *Entries : 509 : Total = 240389363 bytes File Size = 67998112 * * : : Tree compression factor = 3.51 * ****************************************************************************** *Br 0 :_TrackerEndcapHits_MCParticle : * * | Int_t _TrackerEndcapHits_MCParticle_ * *Entries : 509 : Total Size= 6813 bytes File Size = 3410 * *Baskets : 2 : Basket Size= 32000 bytes Compression= 1.26 * *............................................................................* *Br 1 :_TrackerEndcapHits_MCParticle.index : * * | Int_t index[_TrackerEndcapHits_MCParticle_] * *Entries : 509 : Total Size= 52505 bytes File Size = 16493 * *Baskets : 3 : Basket Size= 57344 bytes Compression= 3.14 * *............................................................................* *Br 2 :_TrackerEndcapHits_MCParticle.collectionID : * * | UInt_t collectionID[_TrackerEndcapHits_MCParticle_] * *Entries : 509 : Total Size= 52554 bytes File Size = 2980 * *Baskets : 3 : Basket Size= 57344 bytes Compression= 17.38 * *............................................................................*
24.11.1:
root [3] events->Print("_TrackerEndcapHits_MCParticle*") ****************************************************************************** *Tree :events : events data tree * *Entries : 531 : Total = 5805017410 bytes File Size = 273198902 * * : : Tree compression factor = 21.34 * ****************************************************************************** *Br 0 :_TrackerEndcapHits_MCParticle : * * | Int_t _TrackerEndcapHits_MCParticle_ * *Entries : 531 : Total Size= 11079 bytes File Size = 7026 * *Baskets : 27 : Basket Size= 32000 bytes Compression= 1.01 * *............................................................................* *Br 1 :_TrackerEndcapHits_MCParticle.index : * * | Int_t index[_TrackerEndcapHits_MCParticle_] * *Entries : 531 : Total Size= 52969 bytes File Size = 26661 * *Baskets : 27 : Basket Size= 9269 bytes Compression= 1.95 * *............................................................................* *Br 2 :_TrackerEndcapHits_MCParticle.collectionID : * * | UInt_t collectionID[_TrackerEndcapHits_MCParticle_] * *Entries : 531 : Total Size= 53186 bytes File Size = 6418 * *Baskets : 27 : Basket Size= 9269 bytes Compression= 8.11 * *............................................................................*
-> 2 or 3 baskets to 27.
What is the tree compression factor variable?
Based on the change from
* : : Tree compression factor = 3.51 *
to
* : : Tree compression factor = 21.34 *
we are writing more zeros, compressing them well, but still ending up with larger files.
Based on the change from
*Entries : 509 : Total = 240389363 bytes File Size = 67998112 *
to
*Entries : 531 : Total = 5805017410 bytes File Size = 273198902 *
we are writing 24x more data.
What is the tree compression factor variable?
Likely wrong question. I think you might want to plot the uncompressed total size for each branch.
root [5] auto* b = events->GetBranch("_EcalEndcapPTruthClusters_shapeParameters")
(TBranch *) 0x5819766e3020
root [6] b->GetTotalSize()
(long long) 447741
What is the tree compression factor variable?
Likely wrong question. I think you might want to plot the uncompressed total size for each branch.
root [5] auto* b = events->GetBranch("_EcalEndcapPTruthClusters_shapeParameters") (TBranch *) 0x5819766e3020 root [6] b->GetTotalSize() (long long) 447741
That's what my plot shows. The total branch size, not normalized by entries of course.
I have a feeling that this is not an eicrecon issue? Something in the geometry or removal of that filter in npsim? Could check if the FULL file sizes are also different.
jug_dev> wdconinc@menelaos:~/git/epic$ root -q root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.11.1/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root -e 'for (auto b : *events->GetListOfBranches()) { cout << events->GetBranch(b->GetName())->GetTotalSize() << " " << b->GetName() << endl; }' | sort -n | tail -n 10
153654 _B0ECalClusters_hitContributions
170114 _EcalEndcapPInsertTruthClusters_hitContributions
192479 _CentralTrackerMeasurements_weights
208247 _EcalEndcapPInsertTruthClusters_shapeParameters
293396 _EcalEndcapPTruthClusters_hitContributions
353739 _HcalFarForwardZDCClustersBaseline_hitContributions
371514 _HcalFarForwardZDCTruthClusters_hitContributions
391451 _HcalEndcapPInsertClusters_hitContributions
447741 _EcalEndcapPTruthClusters_shapeParameters
501811 _HcalFarForwardZDCClusters_hitContributions
jug_dev> wdconinc@menelaos:~/git/epic$ root -q root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.10.0/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root -e 'for (auto b : *events->GetListOfBranches()) { cout << events->GetBranch(b->GetName())->GetTotalSize() << " " << b->GetName() << endl; }' | sort -n | tail -n 10
152939 _EcalEndcapPClusters_hitContributions
154035 _CentralCKFTrajectoriesUnfiltered_measurementChi2
162892 _EcalEndcapPInsertTruthClusters_hitContributions
192343 _CentralTrackerMeasurements_weights
192828 _B0ECalClusters_hitContributions
298532 _EcalEndcapPTruthClusters_hitContributions
385423 _HcalEndcapPInsertClusters_hitContributions
406743 _HcalFarForwardZDCClustersBaseline_hitContributions
418713 _HcalFarForwardZDCTruthClusters_hitContributions
563399 _HcalFarForwardZDCClusters_hitContributions
I think it must be the particle threshold. Likely our definitions of truth clusters are inadequate. In addition, hitContributions relation indices may have more unique values now.
I have a wild proposal to exclude truth clusters from this campaign's branches.
Ah, there it is (not clusters or threshold).
jug_dev> wdconinc@menelaos:~/git/epic$ root -q root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.10.0/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root -e 'for (auto b : *events->GetListOfLeaves()) { if (events->GetBranch(b->GetName()) == nullptr) continue; cout << events->GetBranch(b->GetName())->GetTotalSize() << " " << b->GetName() << endl; }' | sort -n | tail -n 10
3620446 HcalFarForwardZDCSubcellHits.dimension.y
3620446 HcalFarForwardZDCSubcellHits.dimension.z
3620466 HcalFarForwardZDCSubcellHits.energyError
3620640 _HcalFarForwardZDCSubcellHits_rawHit.index
3621339 _HcalFarForwardZDCSubcellHits_rawHit.collectionID
4233603 _MCParticlesHeadOnFrameNoBeamFX_parents.index
4233769 _MCParticlesHeadOnFrameNoBeamFX_daughters.index
4234170 _MCParticlesHeadOnFrameNoBeamFX_parents.collectionID
4234336 _MCParticlesHeadOnFrameNoBeamFX_daughters.collectionID
7231483 HcalFarForwardZDCSubcellHits.cellID
jug_dev> wdconinc@menelaos:~/git/epic$ root -q root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.11.1/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root -e 'for (auto b : *events->GetListOfLeaves()) { if (events->GetBranch(b->GetName()) == nullptr) continue; cout << events->GetBranch(b->GetName())->GetTotalSize() << " " << b->GetName() << endl; }' | sort -n | tail -n 10
3076693 HcalFarForwardZDCSubcellHits.dimension.y
3076693 HcalFarForwardZDCSubcellHits.dimension.z
3076713 HcalFarForwardZDCSubcellHits.energyError
3076805 _HcalFarForwardZDCSubcellHits_rawHit.index
3077071 _HcalFarForwardZDCSubcellHits_rawHit.collectionID
6146147 HcalFarForwardZDCSubcellHits.cellID
1378856429 _MCParticlesHeadOnFrameNoBeamFX_parents.index
1378856677 _MCParticlesHeadOnFrameNoBeamFX_daughters.index
1378857283 _MCParticlesHeadOnFrameNoBeamFX_parents.collectionID
1378857531 _MCParticlesHeadOnFrameNoBeamFX_daughters.collectionID
Proposal: disable MCParticlesHeadOnFrameNoBeamFX until fixed, but don't hold up production.
I wonder if this is a dd4hep-1.30 change.
I don't quite understand what causes those numbers. We could filter for status==1 and/or unset cross collection relations to parents/daughters.
https://github.com/eic/EICrecon/blob/main/src/algorithms/reco/UndoAfterBurner.cc
I don't even understand what is written in that branch before this change. How can one event contain 2300+ entries in that collection?
root [2] events->Scan("_MCParticlesHeadOnFrameNoBeamFX_parents.index")
...
* 6 * 2317 * 18 *
* 6 * 2318 * 19 *
* 6 * 2319 * 19 *
* 6 * 2320 * 38 *
* 6 * 2321 * 38 *
* 7 * 0 * 0 *
* 7 * 1 * 0 *
* 7 * 2 * 0 *
* 7 * 3 * 4 *
This seems like the clone breaks the parent/daughter connections and creates a new collection of parent/daughters, and then stores those too. In 24.11.1 it just got quadratically worse.
But I don't see an actual object clone. And I don't think it would know to clone into a specific collection.
But I don't see an actual object clone. And I don't think it would know to clone into a specific collection.
There's a 25 byte increase per character in each collection name, probably not the main concern but interesting.
But I don't see an actual object clone. And I don't think it would know to clone into a specific collection.
I know about that, but inside that clone the relations are just copied, but the referenced objects are not cloned. Thus the MCParticlesHeadOnFrameNoBeamFX
should not be any larger than MCParticles
.
There's a 25 byte increase per character in each collection name, probably not the main concern but interesting.
Would make sense, since we went from 2 cycles to 27.
Comparing the byte size of individual branches of campaign output between 24.11.1 and 24.10.0
[Blue] root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.11.1/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root
[Orange] root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.10.0/epic_craterlake/DIS/NC/18x275/minQ2=1/pythia8NCDIS_18x275_minQ2=1_beamEffects_xAngle=-0.025_hiDiv_5.1853.eicrecon.tree.edm4eic.root
Expected Result: (what do you expect when you execute the steps above)
Would expect them to be similar.
Actual Result: (what do you get when you execute the steps above)
Huge increase in 24.11. 1