cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.32k forks source link

review content saved in miniAOD in DeDxHitInfo to possibly reduce the size on file #36405

Open slava77 opened 2 years ago

slava77 commented 2 years ago

Followed by a recent update in the dedx data selection in #36225, I think that it's a reasonable idea to review if all data saved in DeDxHitInfo in miniAOD is necessary. This data is saved for a somewhat small fraction of tracks, but the size turns out to be close to 0.4 kB per track (using an average from 100 events from workflow 136.793 from DoubleEG Run2017C).

DeDxHitInfo is a vector (per hit) of the following data, which after compression adds up to 36 bytes per hit (as seen in wf 136.793 )

Some of the data could be truncated or zeroed out if it is not used. @cms-sw/xpog-l2 @cms-sw/tracking-pog-l2 please check and comment (or redirect to experts) if the data reduction can be done.

slava77 commented 2 years ago

assign reconstruction,xpog

cmsbuild commented 2 years ago

New categories assigned: xpog,reconstruction

@slava77,@jpata,@mariadalfonso,@gouskos,@fgolf you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 2 years ago

A new Issue was created by @slava77 Slava Krutelyov.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

mmusich commented 2 years ago

Tagging a (semi-)random list of people that might be interested to follow (and might also provide some insight?):

@tvami @echabert @carolinecollard @dvannerom

mmusich commented 2 years ago

Seems the idea didn't fly so far, so let me try to initiate the discussion.

Some of the data could be truncated or zeroed out if it is not used.

tvami commented 2 years ago

Here is a summary from the HSCP team:

Let me tag other people who might care about this package: @ViktorKutzner @ssekmen @dvannerom @kai-wei @srimanob @lowette @kdipetri @SlavaValouev

mmusich commented 2 years ago

@tvami thanks for reply:

  • An option may be to use float16 instead for "float (32)" or "half". There are already float16 used in cmssw.

would somebody of your group be available to study the effect of reducing to float16 ?

  • As far as I know, it is used in to checking that the hit is within the region of interested (typically excluding the edges)

can't it be checked, e.g., by checking the barycenter of the corresponding cluster? Do you really need post-CPE precision?

Consequently it's safer to keep them.

This is not the right approach. Let's study what is really needed and then trim it down to the bare minimum please. For example could the code that calculates your cluster shape variable be move upstream and one (or more) user-float(s) be added to the data format? About the saturated strips, do you need to know just if there is one or more, or they exact locations?

SiPixelCluster: This is needed as it is for passing it to the CPE to extract the probQ / probXY values

In light of https://github.com/cms-sw/cmssw/pull/36247, also that could be move upstream and save directly the quantities needed for the analysis.

dvannerom commented 2 years ago

Hello Marco,

Sorry I missed this email which is actually very important for us FCP analyzers.

Cheers, David

jpata commented 2 years ago

type tracking

tvami commented 2 years ago

type trk

isnt this tracking-pog more than the tracker-pdg, i.e. type tracking? The DPG to my knowledge doesnt use this, although with the current implementation they could possible use it... in case we strip it down, I guess that's not true anymore