Proposal for califa data processing

Mapped Data:

This will become an associative TClonesArray with the tuple (CrystalId, Range) serving as the key. CrystalId will go from one to 2432, Range will be an enum containing HIGH_GAIN and LOW_GAIN.

If I understood well, this would require a tuple per experiment, that depends of the configuration of Califa. It's not that a duplication of Califa mapping container? Could we for example go for a 1 to 2432 configuration and add a GetRange() in the mapping parameter?

Cal Data

I would prefer the alternative concept. ToT energy is only filled if a ToT calibration was used. You suggest four entries per crystal in Iphos, but maybe we can prepare for future and enable this four entries in all crystals. So we could have crystal Id's that go from 1- 2432, a range given by the container and 4 entries {HIGH_GAIN, LOW_GAIN} x {AMPLITUDE, TOT}. If this is stored anyways, does it make sense to have a range key?

R3BCalifaCrystalCal2Cluster

The different methods of determining if two crystals are neighbors will be moved to separate classes

Does this mean that different windows are different classes, for example? I think this could be a bit extreme, since we could use just different methods inside the same class. I don't know if it is worth it to wrap the clustering in a lot of layers...

Optionally, making the hits unique per crystalID may be separated to an interface class. We will have up to four different candidates for an energy measurement. Rules include:

Prefer AMPLITUDE to TOT

For AMPLITUDE signals, the LOW_GAIN energy will determine which signal gets used. If it is below some threshold, the HIGH_GAIN signal will be used.

For TOT signals, always prefer LOW_GAIN.

Rather that prefer AMPLITUDE I would just set an option (someone could be interested in making ToT clusters). Same goes for "always prefer LOW_GAIN in ToT". The rest is quite similar to what I planned for the actual clustering.

The crystal ID of the hit which was used as a pivot element shall be the key of the cluster entry. This key will be stored with an otherwise zero field fClusterId in CalData.

Here I would just include an std list with the crystals that are inside the cluster, being the first one this mother crystal.

R3BCalifaClusterData

Here I agree about the new name, but I don't have make my mind yet if it is better to have two data structures (Cluster for basic reconstruction + new structure for further processing (lorentz, randomization etc)) or just one as you suggest. If we leave empty fields that would make things slower for online users, for example, as they would have to move empty data they are not going to use.

DirectionSmearer (TODO)

Please do not use simple square windows to do this. I already tried in the past and this is only valid for the most backward crystals and if they are exactly aligned with respect to the target (never). For the rest of the cases, specially Iphos, a square randomization produces disgusting artifacts in the spectrum when things start overlapping.

QPIDParser (TODO)

I would not assume that all punch throughs are coming from protons, we have a deuterium proposal for 2024.

If I understood well, this would require a tuple per experiment, that depends of the configuration of Califa. It's not that a duplication of Califa mapping container? Could we for example go for a 1 to 2432 configuration and add a GetRange() in the mapping parameter?

The problem is the iPhos crystals, where we have both ranges. If we already have to support some crystals existing with both ranges, I think it is reasonable to support it for all crystals.

The assignment of single range preamp crystals to either low gain or high gain should be done by the R3BReader (e.g. not in ucesb). I see two possibilities here:

lookup based on parameters
lookup based on slow control events generated from periodic EPICS readouts of the preamplifiers.

The latter has the advantage that we will use more accurate data.

I would prefer the alternative concept. ToT energy is only filled if a ToT calibration was used. You suggest four entries per crystal in Iphos, but maybe we can prepare for future and enable this four entries in all crystals. So we could have crystal Id's that go from 1- 2432, a range given by the container and 4 entries {HIGH_GAIN, LOW_GAIN} x {AMPLITUDE, TOT}. If this is stored anyways, does it make sense to have a range key?

There are two conflicting goals:

To make it easy to plot quantities related to a single crystal, one would like to have all the quantities for one physical crystal in the same data structure:
- per gain:
  - fEnAmp
  - fNf, fNs
  - fEnTot
  - fWRTS
- furthermore, before clustering one will have to select exactly one method for energy (and optionally one for QPID). If we also want to reflect this in the same data structure, this will be three more fields.
To keep the data structures simple, one would prefer not to have many different fields in them. Easier to create one energy histogram per TCA entry than three different ones.

Does this mean that different windows are different classes, for example? I think this could be a bit extreme, since we could use just different methods inside the same class. I don't know if it is worth it to wrap the clustering in a lot of layers...

I think the present way with plenty of methods members in one class and plenty of cases in the switch statement is not great. auto c=new R3BCalifaCrystalCal2Cluster(new R3BCalifaRectNeighborhood(0.25, 0.25)) does not seem so bad to me.

Each Neighborhood definition has their own parameters (theta, phi, cone opening angle, radius, ...) and their own method to test if two crystals are "nearby". IMHO, that is plenty for a class already without going to overengineering territory.

(If the classes are supposed to be usable from macros, they will all need their propper ClassDef and LinkDef lines, which will be kind of a bother).

If we leave empty fields that would make things slower for online users, for example, as they would have to move empty data they are not going to use.

I will grant that it would at least require a bit more memory. My approach is mostly "if there is a 1:1 mapping between two layers, put them in the same Data class". (Notable exception for Mapped and Cal, because we mostly do not want to plot Mapped vs Cal.) The problem is that if you have them in different levels, then the process to plot quantities against each other becomes more complicated: evt->Draw("Bar.fEnCMS:Foo.fEnReco", "", "Bar.fClusterId==Foo.fClusterId") or even (with associative lookup wrappers)

 for (auto& b: BarTCA) 
      h->Fill(b.fEnCMS, FooTCA[b.fClusterId].fEnReco);

is more complex (and expensive to run) than evt->Draw("Foo.fEnCMS:Foo.fEnReco")

Further downsides of the single data approach are:

If one wants to run multiple versions of Foo2Bar at the same time, one has to create a copy of each input TCA of type Foo for each version.
If the input TCA is read from file, a copy with the fields set by Foo2Bar has to to be written out (whereas friend trees could be used otherwise for subsequent analysis).

Personally, I think the the convenience of being able to easily plot stuff from different "sub-levels" of the analysis against each other is more important than having a design which is optimized from the start. PhD students do not follow Moores Law. If we need to process Petabytes of data taken from months of beam time at some point, there is nothing stopping us from putting all the required analysis steps between lmd and histograms in one FairRun and processing it with minimal overhead. For smaller analysis tasks where we try to optimize settings, I worry much more about wasting brain power than CPU power, or about plots not being done because they are difficult to do.

I would not assume that all punch throughs are coming from protons, we have a deuterium proposal for 2024.

True. We will not be able to differentiate both punch-throughs in CALIFA alone though.

At the moment the code is mostly used for photons (e.g. Doppler as a special case of the Lorentz boost). For charged particles, we could have information from the tracking detectors: a better sense of the momentum direction in the lab frame, plus energy loss measurements in some tracking detector.

This would go in between QPID and EnReco. So the QPID sets fPDG to a special value meant punch-through, then the fPDG (and momentum direction) can be updated based on the data from both califa and the tracking detectors (which are probably a bit experiment-specific) and then the EnReco task can use that particle to calculate the energy from delta E.

In experiments without an energy loss in the tracking detector, I think it would be reasonable to assume that any punch through is a proton. (I am not particularly happy about splitting QPID and EnReco. A different method would be to keep QPID and Energy reconstruction in one Task, but have a hook for a custom class which can apply tracking detector information between these steps.)

Edit: fixed missing line start in quotation.

R3BRootGroup / R3BRoot

Proposal for califa data processing #721