CYGNUS-RD / reconstruction

Camera and scope analysis tools
0 stars 25 forks source link

Changes for the Reduced Pixels event content #172

Closed emanueledimarco closed 2 years ago

emanueledimarco commented 2 years ago

This is a changes on the event content of the reconstruction which affects only, but significantly the people using the pixels inside the trees. Especially @GioDho , Samuele, @baracch

-- Before: If the scfullinfo was set to true =>

-- Now: The idea, discussed at the reco meeting, was to save only the pixels passing some selection. To implement this, I had to change the way they are saved (a unique array on the SC block would have holes, so no way to reduce the event size). So now

This changes the offline analysis for the relevant people, so I wait to merge this until they check that this is OK for them. @davidepinci also check conceptually.

Now, on the event size (also @gmazzitelli for the cloud storage of the RECO output). I did some checks on the event content with different selections / configurations. A full report on the size can be found in the links. Estimating the size on run 4152 (iron + LNF typical background @440 V), the summary is:

If you let us know if this is ok, I and @davidepinci can start with a full production of trees

GioDho commented 2 years ago

I did not understand a couple of things.

  1. When a cluster passes the selection, are all of its pixels saved? I mean, for a cluster that survived the cut, will it look exactly the same as now? (I am just talking about which pixels of the supercluster are saved, not about the saving structure).

  2. Are the non suppressed and zero-suppressed both saved? Because for the iron dimension I use the non zero-suppressed, while I know Samuele uses the zero-suppressed. So, are they both saved or is there a flag that allows you to save both or just the zero-suppressed or save nothing?

  3. I have a suggestion. Why don't we multiply the z value by 100 (or 10) and then round to integer? This way the space would be saved anyway, since we are using integers but we will keep a certain precision. We just need to remember to divide by 100 when analysing.

emanueledimarco commented 2 years ago
  1. When a cluster passes the selection, are all of its pixels saved? I mean, for a cluster that survived the cut, will it look exactly the same as now? (I am just talking about which pixels of the supercluster are saved, not about the saving structure).

yes, the contents are the same (for each pixel, it saves x,y,z). For the z, I would round it to integer, to reduce the size of one-half for free. I didn't save the "ID", which I think you used to link the cluster to pixels, that should not be necessary (but let me know).

  1. Are the non suppressed and zero-suppressed both saved? Because for the iron dimension I use the non zero-suppressed, while I know Samuele uses the zero-suppressed. So, are they both saved or is there a flag that allows you to save both or just the zero-suppressed or save nothing?

I save the non-zero suppressed (i.e. all). Saving the zero-suppressed in addition seems a duplication to me, one can zero-suppress offline

  1. I have a suggestion. Why don't we multiply the z value by 100 (or 10) and then round to integer? This way the space would be saved anyway, since we are using integers but we will keep a certain precision. We just need to remember to divide by 100 when analysing.

why do you need the one-digit precision? The light counts are integer, the float value comes from the pedestal subtraction, which becomes non integer only because we average the counts over n events. But ok, if you think you need that precision we can multiply it

@GioDho @davidepinci @baracch

I'll send a mail to the reconstruction list, I think that this is not reaching many

GioDho commented 2 years ago

yes, the contents are the same (for each pixel, it saves x,y,z). For the z, I would round it to integer, to reduce the size of one-half for free. I didn't save the "ID", which I think you used to link the cluster to pixels, that should not be necessary (but let me know).

Perfect. Actually we created the ID to number them. We were also applying some cuts and the cluster which did not pass had ID=-1 and only one pixel was saved. So the idea is the same, now it would be better implemented. So I agree that we do not need the ID.

I save the non-zero suppressed (i.e. all). Saving the zero-suppressed in addition seems a duplication to me, one can zero-suppress offline

It seems logic. So offline one would need to take the corresponding pedestal again and remove, if necessary, some pixels. It will complicate the offline analysis, but I think it is doable considering how much heavier the files may become when saving both.

why do you need the one-digit precision? The light counts are integer, the float value comes from the pedestal subtraction, which becomes non integer only because we average the counts over n events. But ok, if you think you need that precision we > can multiply it

Actually you are right, there is likely no reason to keep the precision. I forgot that the z is made of counts. Maybe this is a thing that can be tested to some MC data to see the difference and then revert to one-digit precision if the results change.

emanueledimarco commented 2 years ago

I save the non-zero suppressed (i.e. all). Saving the zero-suppressed in addition seems a duplication to me, one can zero-suppress offline

It seems logic. So offline one would need to take the corresponding pedestal again and remove, if necessary, some pixels. It will complicate the offline analysis, but I think it is doable considering how much heavier the files may become when saving both.

yes I understand the extra complication. If Samuele thinks this is a big problem, we can see if we can do a search between the two pixels collections attached to the cluster (zero-suppressed and non) and save an extra bool flag for each pixel (ZS or not). But the double loop or the find can cost some CPU. Otherwise, maybe he can survive with

  1. making an approximate ZS offline (threshold at some fixed value)
  2. re-use the pedestal map to do the exact same thing as reconstruction

why do you need the one-digit precision? The light counts are integer, the float value comes from the pedestal subtraction, which becomes non integer only because we average the counts over n events. But ok, if you think you need that precision we > can multiply it

Actually you are right, there is likely no reason to keep the precision. I forgot that the z is made of counts. Maybe this is a thing that can be tested to some MC data to see the difference and then revert to one-digit precision if the results change.

so ok. For the sake of not having to remember the factor, I tried to save directly the 1-digit precision float (round(x*10)/10)) and leaving ROOT compression to make the magic. The size increases a bit wrt the integer, but not much:

what do you think? @GioDho @davidepinci @gmazzitelli @baracch

GioDho commented 2 years ago

For what concerns the 1-digit precision I would say it looks fine. For the zero suppression I do not think re-using the pedestal mask would be too much of a burden, but I'll wait for samuele, who is using the zero suppressed more, to see what he thinks of it. It will also depend on how much time it will require to redo the mask, but probably less than using the find operation during the reconstruction.