Changes for the Reduced Pixels event content

emanueledimarco commented 2 years ago

This is a changes on the event content of the reconstruction which affects only, but significantly the people using the pixels inside the trees. Especially @GioDho , Samuele, @baracch

-- Before: If the scfullinfo was set to true =>

ALL the pixels were saved , in a unique array in the supercluster (SC) block (needed an unrolling afterwards in offline analysus)
two sets where saved: all of them (non zero-suppressed) and only the ones above threshold (with duplication of size)

-- Now: The idea, discussed at the reco meeting, was to save only the pixels passing some selection. To implement this, I had to change the way they are saved (a unique array on the SC block would have holes, so no way to reduce the event size). So now

The selection of the clusters (based just on cluster length and integral) is configurable in modules_config/reco_eventcontent.txt
the "reduced" pixels are saved in a new array block (redpix). Each has ix, iy and iz.
The link between the clusters and its pixels can be done with "sc_redpixIdx" which says the index of the first pixel in the redpix block and with "sc_size" which says the number of (unsuppressed) pixels in the cluster. (I.e. for a cluster i its pixels are within [sc_redpixIdx[i] ... sc_redpixIdx[i] + sc_size[i] ]

This changes the offline analysis for the relevant people, so I wait to merge this until they check that this is OK for them. @davidepinci also check conceptually.

Now, on the event size (also @gmazzitelli for the cloud storage of the RECO output). I did some checks on the event content with different selections / configurations. A full report on the size can be found in the links. Estimating the size on run 4152 (iron + LNF typical background @440 V), the summary is:

NO pixels information (current): 42kb / event: https://emanuele.web.cern.ch/emanuele/Cygnus/reco/trees/size_noPix.html
full pixels information (for all clusters): 1200 kb/event https://emanuele.web.cern.ch/emanuele/Cygnus/reco/trees/size_fullPix.html
pixels only for clusters < 5cm: 500 kb/event https://emanuele.web.cern.ch/emanuele/Cygnus/reco/trees/size_pixMaxLen50mm.html
pixels only for clusters < 5 cm and integral > 1k counts (removes "fake clusters"): 400 kb/event https://emanuele.web.cern.ch/emanuele/Cygnus/reco/trees/size_pixMaxLen50mm_MinIntegral1k.html
as above, but rounding the z as integer (as you can see from the report, the z of the pixels takes 80% of the size of the block, because it is a float after pedestal subtraction, but we don't care much of the first digit)

If you let us know if this is ok, I and @davidepinci can start with a full production of trees

GioDho commented 2 years ago

I did not understand a couple of things.

When a cluster passes the selection, are all of its pixels saved? I mean, for a cluster that survived the cut, will it look exactly the same as now? (I am just talking about which pixels of the supercluster are saved, not about the saving structure).
Are the non suppressed and zero-suppressed both saved? Because for the iron dimension I use the non zero-suppressed, while I know Samuele uses the zero-suppressed. So, are they both saved or is there a flag that allows you to save both or just the zero-suppressed or save nothing?
I have a suggestion. Why don't we multiply the z value by 100 (or 10) and then round to integer? This way the space would be saved anyway, since we are using integers but we will keep a certain precision. We just need to remember to divide by 100 when analysing.

emanueledimarco commented 2 years ago

When a cluster passes the selection, are all of its pixels saved? I mean, for a cluster that survived the cut, will it look exactly the same as now? (I am just talking about which pixels of the supercluster are saved, not about the saving structure).

yes, the contents are the same (for each pixel, it saves x,y,z). For the z, I would round it to integer, to reduce the size of one-half for free. I didn't save the "ID", which I think you used to link the cluster to pixels, that should not be necessary (but let me know).

Are the non suppressed and zero-suppressed both saved? Because for the iron dimension I use the non zero-suppressed, while I know Samuele uses the zero-suppressed. So, are they both saved or is there a flag that allows you to save both or just the zero-suppressed or save nothing?

I save the non-zero suppressed (i.e. all). Saving the zero-suppressed in addition seems a duplication to me, one can zero-suppress offline

I have a suggestion. Why don't we multiply the z value by 100 (or 10) and then round to integer? This way the space would be saved anyway, since we are using integers but we will keep a certain precision. We just need to remember to divide by 100 when analysing.

why do you need the one-digit precision? The light counts are integer, the float value comes from the pedestal subtraction, which becomes non integer only because we average the counts over n events. But ok, if you think you need that precision we can multiply it

@GioDho @davidepinci @baracch

I'll send a mail to the reconstruction list, I think that this is not reaching many

GioDho commented 2 years ago

yes, the contents are the same (for each pixel, it saves x,y,z). For the z, I would round it to integer, to reduce the size of one-half for free. I didn't save the "ID", which I think you used to link the cluster to pixels, that should not be necessary (but let me know).

Perfect. Actually we created the ID to number them. We were also applying some cuts and the cluster which did not pass had ID=-1 and only one pixel was saved. So the idea is the same, now it would be better implemented. So I agree that we do not need the ID.

I save the non-zero suppressed (i.e. all). Saving the zero-suppressed in addition seems a duplication to me, one can zero-suppress offline

It seems logic. So offline one would need to take the corresponding pedestal again and remove, if necessary, some pixels. It will complicate the offline analysis, but I think it is doable considering how much heavier the files may become when saving both.

why do you need the one-digit precision? The light counts are integer, the float value comes from the pedestal subtraction, which becomes non integer only because we average the counts over n events. But ok, if you think you need that precision we > can multiply it

Actually you are right, there is likely no reason to keep the precision. I forgot that the z is made of counts. Maybe this is a thing that can be tested to some MC data to see the difference and then revert to one-digit precision if the results change.

emanueledimarco commented 2 years ago

I save the non-zero suppressed (i.e. all). Saving the zero-suppressed in addition seems a duplication to me, one can zero-suppress offline

It seems logic. So offline one would need to take the corresponding pedestal again and remove, if necessary, some pixels. It will complicate the offline analysis, but I think it is doable considering how much heavier the files may become when saving both.

yes I understand the extra complication. If Samuele thinks this is a big problem, we can see if we can do a search between the two pixels collections attached to the cluster (zero-suppressed and non) and save an extra bool flag for each pixel (ZS or not). But the double loop or the find can cost some CPU. Otherwise, maybe he can survive with

making an approximate ZS offline (threshold at some fixed value)
re-use the pedestal map to do the exact same thing as reconstruction

why do you need the one-digit precision? The light counts are integer, the float value comes from the pedestal subtraction, which becomes non integer only because we average the counts over n events. But ok, if you think you need that precision we > can multiply it

Actually you are right, there is likely no reason to keep the precision. I forgot that the z is made of counts. Maybe this is a thing that can be tested to some MC data to see the difference and then revert to one-digit precision if the results change.

so ok. For the sake of not having to remember the factor, I tried to save directly the 1-digit precision float (round(x*10)/10)) and leaving ROOT compression to make the magic. The size increases a bit wrt the integer, but not much:

integer z: 200 kb / ev see link for details
1 digit float z: 270 kb / event see link for details

what do you think? @GioDho @davidepinci @gmazzitelli @baracch

GioDho commented 2 years ago

For what concerns the 1-digit precision I would say it looks fine. For the zero suppression I do not think re-using the pedestal mask would be too much of a burden, but I'll wait for samuele, who is using the zero suppressed more, to see what he thinks of it. It will also depend on how much time it will require to redo the mask, but probably less than using the find operation during the reconstruction.

CYGNUS-RD / reconstruction

Changes for the Reduced Pixels event content #172