eic / EDM4eic

A data model for EIC defined with podio and based on EDM4hep.
https://eic.github.io/EDM4eic/
GNU Lesser General Public License v3.0
3 stars 4 forks source link

Proposal for Waveform and Raw HGCROC Hit #89

Open ruse-traveler opened 4 months ago

ruse-traveler commented 4 months ago

PXL_20240725_235606610

One of the items discussed during the July 2024 ePIC Collaboration Meeting was the steps needed to better simulate the HGCROC. One thing identified was the need to make sure we have the data types necessary to capture both the waveform produced by a SiPM as well as the output of the HGCROC.

Describe the solution you'd like

For the waveform, we could make use of edm4hep::RawTimeSeries (from here).:

  edm4hep::RawTimeSeries:
    Description: "Raw data of a detector readout"
    Author: "EDM4hep authors"
    Members:
       - uint64_t cellID  // detector specific cell id
       - int32_t quality               // quality flag for the hit
       - float time [ns]               // time of the hit
       - float charge [fC]             // integrated charge of the hit
       - float interval [ns]           // interval of each sampling
    VectorMembers:
       - int32_t adcCounts          // raw data (32-bit) word at i

But for the HGCROC output, we would need to extend what's already in edm4hep::RawCalorimeterHit. For example:

edm4eic::RawHGCROCHit:
  Description: "Raw hit from an HGCROC"
  Author: "H. G. Croc"
  Members:
    - uint64_t cellID // detector specific (geometrical) cell id
    - int32_t timeOfArrival // ToA value [adc counts]
    - int32_t timeOverThreshold // ToT value [adc counts]
    - int32_t amplitude // amplitude of hit [adc counts]
    - int32_t timeSamp // time stamp for the hit

Describe alternatives you've considered

One alternate approach could be to save only one of the values from the HGCROC (e.g. amplitude) in edm4hep::RawCalorimeterHit

veprbl commented 4 months ago

ToA is already part of the edm4hep::RawCalorimeterHit. ToT we could add in our own extension in edm4eic:: namespace, or even submit that upstream. A different question is how to store waveforms between the simulation-type algorithms.

ruse-traveler commented 4 months ago

ToA is already part of the edm4hep::RawCalorimeterHit

Is it? I'm seeing an amplitude field, but not a ToA field...

  #-------------  RawCalorimeterHit
  edm4hep::RawCalorimeterHit:
    Description: "Raw calorimeter hit"
    Author: "EDM4hep authors"
    Members:
      - uint64_t cellID   // detector specific (geometrical) cell id
      - int32_t amplitude               // amplitude of the hit in ADC counts
      - int32_t timeStamp               // time stamp for the hit

Either way, I'd also be okay with adding the ToT (and ToA if needed) to our extension or upstream!

ruse-traveler commented 4 months ago

A different question is how to store waveforms between the simulation-type algorithms.

Yeah... I was also wondering about that 😕

The edm4hep::RawTimeSeries gets kind of close to how the waveform output is handled in SimSiPM...

novitzky commented 3 months ago

The data type looks okay, the only thing is that we will have N samples, when N is settable number. The additional shaper would change a bit the initial SiPM shape for the ADC. The TOA would only fire when it passes the threshold from lower to higher values and the TOT would fire if it passes the threshold from higher to lower values. The self-trigger would also fire if the TOA > 0. As we are running on a 40 MHz sampling, for very small signals it can be that the ADC == 0 in all samples, but we still triggered. That means that the signal was so small and we were unlucky to be out of phase to not to catch a non-zero ADC value. Just we 'know' that this was still a very small signal and we have a TOA > 0. I am thinking how it would behave in a double hit, when there are 2 hits in close (or consecutive) bunch-crossings. But I guess this can be simulated afterwards.

mandrenguyen commented 3 months ago

In case it's useful to look at the CMS HGCAL reconstruction, which of course uses the HGCROC, let me post some links here.

The code that transforms the RAW digitized data ("digis") into uncalibrated reconstructed hits is here: https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/HGCalRecAlgos/interface/HGCalUncalibRecHitRecWeightsAlgo.h

In particular you can see that sample.mode() tells you if the ROC was saturated or not, and you can see how the amplitude is extracted in the two cases here: https://github.com/cms-sw/cmssw/blob/29ab9c802e7d765e343be3cdb163e135043f93c8/RecoLocalCalo/HGCalRecAlgos/interface/HGCalUncalibRecHitRecWeightsAlgo.h#L74

The uncalibrated hits are calibrated here: https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/HGCalRecAlgos/interface/HGCalRecHitSimpleAlgo.h There's not much to see though, simply a set of weights loaded from somewhere that depend on the layer number.

The relevant data formats are here: https://github.com/cms-sw/cmssw/blob/master/DataFormats/HGCRecHit/interface/HGCUncalibratedRecHit.h https://github.com/cms-sw/cmssw/blob/master/DataFormats/HGCRecHit/interface/HGCRecHit.h

Unfortunately, I don't yet have much useful information about how digis themselves are obtained. What we can see is that for the moment the digis are "faked" from the corresponding sim data format: https://github.com/cms-sw/cmssw/blob/master/EventFilter/HGCalRawToDigi/plugins/HGCalRawToDigiFake.cc

Some information about the HGCAL digi format can be gleaned here: https://github.com/cms-sw/cmssw/tree/master/DataFormats/HGCalDigi/interface

I have seen that there is dedicated code used for HGCAL test beams so presumably there is more useful information there, but I haven't had the chance to dig in there yet.

ruse-traveler commented 3 months ago

Awesome! Thanks, guys! This is extremely useful feedback!

So for the samples, @novitzky, I assume you're referring to the waveform? One way we could handle setting n samples for that is that the algorithm which produces the waveforms will have N{samples} as a parameter, so that each waveform coming out of the algorithm has N{samples} entries sitting in the vector of ADC values. Users could then retrieve the number of samples with getAdcCounts().size()...

Also thanks for sharing these links, @mandrenguyen! These will be really valuable reference points while we're putting together the relevant algorithms! I can definitely see how adding some flags for whether or not a channel was saturated would be useful... For my own understanding, are the OOT variables used only for an out-of-time event?