JeffersonLab / hps-java

HPS reconstruction and analysis framework in Java
2 stars 10 forks source link

Allow pulser data to be mixed with MC signal #225

Closed mholtrop closed 2 years ago

mholtrop commented 6 years ago

To simulate backgrounds more accurately and to speed up the MC creation, we need to be able to merge pulser data in with MC data. This requires the correct merging of hits at the detector level, not only inserting a MC beam bucket into the pulser data. Care must be taken to get the timing correct of all the signals.

tongtongcao commented 3 years ago

Explanations for updates of the readout system: Update of the Readout System for Beam Background Merging.pdf

The steering file for the overlay tool:

<?xml version="1.0" encoding="UTF-8"?>
<lcsim xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xs:noNamespaceSchemaLocation="http://www.lcsim.org/schemas/lcsim/1.0/lcsim.xsd">
    <inputFiles>
        <!-- signal input file with MC events -->
        <file>./tritrig.slcio</file>
    </inputFiles>
    <execute>
        <driver name="OverlayDriver" />
        <driver name="OutputDriver" />
    </execute>
    <drivers>
        <driver name="OverlayDriver" type="org.hps.digi.DataOverlayDriver">
            <!-- pulser overlay file -->
            <inputFile>./pulser.slcio</inputFile>
        </driver>
        <driver name="OutputDriver" type="org.lcsim.util.loop.LCIODriver">
            <!-- output file will contain MC collections from signal and data collections from background -->
            <outputFilePath>./tritrig_and_pulser_data.slcio</outputFilePath>
        </driver>
    </drivers>
</lcsim>

The steering file for readout:

<?xml version="1.0" encoding="UTF-8"?>
<lcsim xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
       xs:noNamespaceSchemaLocation="http://www.lcsim.org/schemas/lcsim/1.0/lcsim.xsd">
    <!-- 
      @Readout steering file with pairs trigger for 2019 MC.
      @author <a href="mailto:caot@jlab.org">Tongtong Cao</a>
    -->
    <execute>
        <!-- SLiC Data Input Drivers -->
        <driver name="EcalHitsInputDriver"/>
        <driver name="MCParticleInputDriver"/>
        <driver name="HodoscopeHitsOutputDriver"/>

        <!-- Pulser Data Input Drivers -->
        <driver name="EcalRawHitsInputDriver"/>
        <driver name="HodoscopeRawHitsInputDriver"/>

        <!-- Hodoscope Readout Simulation Drivers -->
        <driver name="HodoscopePreprocessingDriver"/>
        <driver name="HodoscopeDigitizationnWithPulserDataMergingDriver"/>
        <driver name="HodoscopeRawConverterDriver"/>
    <driver name="HodoscopePatternDriver"/>

        <!-- Calorimeter Readout Simulation Drivers -->
        <driver name="EcalDigitizationWithPulserDataMergingDriver"/>  
        <driver name="EcalRawConverterDriver"/>
        <driver name="GTPReadoutDriver"/>      

        <!-- Trigger Simulation -->
        <driver name="SinglesTrigger"/>

        <!-- LCIO Output and Data Management Driver -->
        <driver name="ReadoutManagerDriver"/>

        <driver name="CleanupDriver" />
    </execute> 

    <drivers>
        <!--
             Truth handler drivers load truth information from the input SLIC file
             and pass them off to the readout data manager, where they may be
             accessed by other readout drivers.

             It is required that these drivers specify the name of the collection
             that they manage, and be of the appropriate handler type that matches
             the object type of the collection. They may also, optionally, specify
             whether the truth collection managed by the driver should be output
             into the readout file, and if so, over what time range.

             By default, SLIC truth data is not written out. If no output window is
             specified, and truth is written, the output window will be derived
             from the readout window and trigger offset parameters of the readout
             data manager.

             In general, calorimeter truth information (and the related particles
             data) are best handled by including truth readout in the calorimeter
             simulation. This will automatically include all calorimeter truth hits
             and related MC particles in the readout file.
          -->
        <driver name="EcalHitsInputDriver" type="org.hps.readout.SimCalorimeterHitReadoutDriver">
            <collectionName>EcalHits</collectionName>

            <!-- Units of ns. -->
            <readoutWindowBefore>8.0</readoutWindowBefore>
            <readoutWindowAfter>32.0</readoutWindowAfter>
            <persistent>true</persistent>
        </driver>

        <driver name="MCParticleInputDriver" type="org.hps.readout.MCParticleReadoutDriver">
            <collectionName>MCParticle</collectionName>

            <!-- Units of ns. -->
            <readoutWindowBefore>32.0</readoutWindowBefore>
            <readoutWindowAfter>32.0</readoutWindowAfter>
            <persistent>true</persistent>
        </driver>

        <driver name="HodoscopeHitsOutputDriver" type="org.hps.readout.SimTrackerHitReadoutDriver">
            <collectionName>HodoscopeHits</collectionName>

            <!-- Units of ns. -->
            <readoutWindowBefore>8.0</readoutWindowBefore>
            <readoutWindowAfter>32.0</readoutWindowAfter>
            <persistent>true</persistent>
        </driver>

        <driver name="EcalRawHitsInputDriver" type="org.hps.digi.RawTrackerHitReadoutDriver">
            <collectionName>EcalReadoutHits</collectionName>

            <!-- Units of ns. -->
            <readoutWindowBefore>32.0</readoutWindowBefore>
            <readoutWindowAfter>32.0</readoutWindowAfter>
            <persistent>true</persistent>
        </driver>

        <driver name="HodoscopeRawHitsInputDriver" type="org.hps.digi.RawTrackerHitReadoutDriver">
            <collectionName>HodoReadoutHits</collectionName>

            <!-- Units of ns. -->
            <readoutWindowBefore>32.0</readoutWindowBefore>
            <readoutWindowAfter>32.0</readoutWindowAfter>
            <persistent>true</persistent>
        </driver>

       <!--
             The calorimeter readout driver handles conversion of SLIC truth
             hits into voltage pulses and ultimately into ADC counts every 4 ns
             sample. These samples are then integrated and output as hits which
             are used internally by the readout simulation in the collection
             set by variable "outputHitCollectionName".

             When a trigger occurs, the ADC buffer is used to generate readout
             hits. The exact form these take differs based on the mode that is
             simulated, but they are always output to the collection defined by
             variable "readoutHitCollectionName".

             If truth information is enabled, then a set of truth relations are
             output as well that link each readout hit to all of the truth hits
             that are associated with it. Additionally, all truth hits as well
             as the particle (and its parents) that generated that truth hit
             are written out to ensure that they are available post-readout.
             The additional truth information is automatically written to the
             same collection name as the input truth data. If a truth handler
             driver also outputs data into this collection, the two will merge.
          -->
        <driver name="EcalDigitizationWithPulserDataMergingDriver" type="org.hps.digi.EcalDigitizationWithPulserDataMergingReadoutDriver">
            <!-- LCIO Collection Names -->
            <inputHitCollectionName>EcalHits</inputHitCollectionName>
            <inputPulserDataCollectionName>PulserDataEcalReadoutHits</inputPulserDataCollectionName>
            <outputHitCollectionName>EcalRawHits</outputHitCollectionName>
            <readoutHitCollectionName>EcalReadoutHits</readoutHitCollectionName>
            <truthRelationsCollectionName>EcalTruthRelations</truthRelationsCollectionName>
            <triggerPathTruthRelationsCollectionName>TriggerPathTruthRelations</triggerPathTruthRelationsCollectionName>

            <!-- Driver Parameters -->
            <mode>1</mode>                                  <!-- Allowed values: 1, 3, or 7. -->
            <addNoise>true</addNoise>

            <!-- 
                 Readout offset is not the same as the old system - it measures
                 amount of samples that are included before the trigger time in
                 the ADC readout window. The old version can be converted by
                 selecting a readout window equal to (readoutLatency - 64).
              -->
            <pulserDataWindow>48</pulserDataWindow>         <!-- Units of 4 ns clock-cycles. -->
            <readoutOffset>10</readoutOffset>               <!-- Units of 4 ns clock-cycles. -->
            <readoutWindow>48</readoutWindow>               <!-- Units of 4 ns clock-cycles. -->
            <numberSamplesAfter>23</numberSamplesAfter>     <!-- Units of 4 ns clock-cycles. -->
            <numberSamplesBefore>3</numberSamplesBefore>    <!-- Units of 4 ns clock-cycles. -->
            <integrationThreshold>18</integrationThreshold> <!-- Units of ADC. -->

            <!--
                The digitization driver produces as output a list of ADC values
                within a specified window, in emulation of Mode-1 data. This
                removes the truth information that is otherwise present in the
                original SLiC output. Setting this option to true creates new
                LCRelation objects that link the ADC list to the truth hits that
                created it and stores this in readout. For production running,
                this should generally be off to save space. Truth hits will be
                included in readout automatically - the truth hit driver above
                does not need to be persistent.
            -->
            <writeTruth>true</writeTruth>

            <!--
                As above, except that truth relations are persisted for the
                readout hits that are seen by the clusterer and trigger. This
                is useful if some analysis needs to be performed at the readout
                level. Otherwise, this should be left off.
            -->
            <writeTriggerPathTruth>false</writeTriggerPathTruth>
        </driver>

        <!--
             The raw converter handles the conversion of simulated ADC pulses from
             the calorimeter readout driver into proper hits that can be used for
             triggering.

             Note that it allows for these hits to be written to LCIO if desired,
             though by default they are not persisted. This is generally unneeded,
             since the clusterer will automatically output the hits which appear in
             GTP clusters if cluster output is enabled.
          -->
        <driver name="EcalRawConverterDriver" type="org.hps.readout.ecal.updated.EcalRawConverterReadoutDriver">
            <!--
                Define the LCIO collection names. The input collection should
                come from the digitization driver.
            -->
            <inputCollectionName>EcalRawHits</inputCollectionName>
            <outputCollectionName>EcalCorrectedHits</outputCollectionName>

            <!--
                NSA and NSB must match the digitization driver settings.
            -->
            <numberSamplesAfter>23</numberSamplesAfter>     <!-- Units of 4 ns clock-cycles. -->
            <numberSamplesBefore>3</numberSamplesBefore>    <!-- Units of 4 ns clock-cycles. -->

            <!--
                Outputs all the trigger-level hits within the readout window.
                This is not generally necessary outside of specialized
                circumstances. Note that all readout hits associated with GTP
                clusters will automatically be included if GTP clusters are
                persisted.
            -->
            <persistent>false</persistent>
        </driver>

        <!--
             The GTP clusterer creates clusters from converted calorimeter hits
             for use in the trigger. It take two parametesr: the clustering
             window, which specifies the number of clock-cycles in which a seed
             hit candidate must be a spatiotemporal maximum, and seed energy
             threshold, which specifies how much energy a seed hit candidate
             must have in order to be used.

             The GTP clusterer is able to be persisted into LCIO. If persisted,
             it will output all of the clusters in the readout window range
             (either specified manually as with the truth drivers or using the
             default readout window and trigger offset of the manager). It will
             also output all of the hits contained in each cluster into the
             output file as well.
          -->
        <driver name="GTPReadoutDriver" type="org.hps.readout.ecal.updated.GTPClusterReadoutDriver">
            <!-- Units of 4 ns clock-cycles. -->
            <clusterWindow>4</clusterWindow>
            <!-- Units of GeV. -->
            <seedEnergyThreshold>0.050</seedEnergyThreshold>

            <!--
                Specifies whether GTP clusters should be persisted in the
                output LCIO data. This should generally be false for production
                running to save space, but can be useful for some analyses, as
                the exact trigger-time clusters are not otherwise recoverable
                after readout.
            -->
            <persistent>false</persistent>
        </driver>

        <!--
            The hodoscope preprocessing driver is responsible for taking SLiC
            truth hits and converting them to a form usable by the simulation.
            Truth hits are of the type SimTrackerHit and need to be converted
            to the type SimCalorimeterHit. Additionally, SLiC is not able to
            produce hits on multiple channels from the same scintillator. Some
            scintillators do have multiple channels in the actual detector. As
            such, it is necessary to split the energies of hits occurring on
            scintillators with mutliple channels across the relevant channels.
            This is handled by this driver.
        -->
        <driver name="HodoscopePreprocessingDriver" type="org.hps.readout.hodoscope.HodoscopePreprocessingDriver">
            <truthHitCollectionName>HodoscopeHits</truthHitCollectionName>
            <outputHitCollectionName>HodoscopePreprocessedHits</outputHitCollectionName>

            <persistent>true</persistent>
        </driver>

        <!--
             The hodoscope digitization driver works identically to the
             calorimeter digitization driver, except for the hodoscope.
          -->
        <driver name="HodoscopeDigitizationnWithPulserDataMergingDriver" type="org.hps.digi.HodoscopeDigitizationWithPulserDataMergingReadoutDriver">
            <!-- LCIO Collection Names -->
            <inputHitCollectionName>HodoscopePreprocessedHits</inputHitCollectionName>
            <inputPulserDataCollectionName>PulserDataHodoReadoutHits</inputPulserDataCollectionName>
            <outputHitCollectionName>HodoscopeRawHits</outputHitCollectionName>
            <readoutHitCollectionName>HodoscopeReadoutHits</readoutHitCollectionName>
            <truthRelationsCollectionName>HodoscopeTruthRelations</truthRelationsCollectionName>
            <triggerPathTruthRelationsCollectionName>HodoscopeTriggerPathTruthRelations</triggerPathTruthRelationsCollectionName>

            <!-- Driver Parameters -->
            <mode>1</mode>                                  <!-- Allowed values: 1, 3, or 7. -->
            <addNoise>false</addNoise>

            <!-- 
                 Readout offset is not the same as the old system - it measures
                 amount of samples that are included before the trigger time in
                 the ADC readout window. The old version can be converted by
                 selecting a readout window equal to (readoutLatency - 64).
              -->
            <pulserDataWindow>32</pulserDataWindow>         <!-- Units of 4 ns clock-cycles. -->
            <readoutOffset>10</readoutOffset>               <!-- Units of 4 ns clock-cycles. -->
            <readoutWindow>32</readoutWindow>               <!-- Units of 4 ns clock-cycles. -->
            <numberSamplesAfter>13</numberSamplesAfter>     <!-- Units of 4 ns clock-cycles. -->
            <numberSamplesBefore>5</numberSamplesBefore>    <!-- Units of 4 ns clock-cycles. -->
            <integrationThreshold>12</integrationThreshold> <!-- Units
        of ADC. -->

        <!--Factor for gain conversion from self-defined unit/ADC to
        MeV/ADC. -->            
        <factorGainConversion>0.000833333</factorGainConversion>

            <!--
                The digitization driver produces as output a list of ADC values
                within a specified window, in emulation of Mode-1 data. This
                removes the truth information that is otherwise present in the
                original SLiC output. Setting this option to true creates new
                LCRelation objects that link the ADC list to the truth hits that
                created it and stores this in readout. For production running,
                this should generally be off to save space. Truth hits will be
                included in readout automatically - the truth hit driver above
                does not need to be persistent.
            -->
            <writeTruth>true</writeTruth>

            <!--
                As above, except that truth relations are persisted for the
                readout hits that are seen by the clusterer and trigger. This
                is useful if some analysis needs to be performed at the readout
                level. Otherwise, this should be left off.
            -->
            <writeTriggerPathTruth>false</writeTriggerPathTruth>
        </driver>

        <!--
             The hodoscope raw converter functions as per the calorimeter raw
             converter driver.
          -->
        <driver name="HodoscopeRawConverterDriver" type="org.hps.readout.hodoscope.HodoscopeRawConverterReadoutDriver">
            <!--
                Define the LCIO collection names. The input collection should
                come from the digitization driver.
            -->
            <inputCollectionName>HodoscopeRawHits</inputCollectionName>
            <outputCollectionName>HodoscopeCorrectedHits</outputCollectionName>

            <!--
                NSA and NSB must match the digitization driver settings.
            -->
            <numberSamplesAfter>13</numberSamplesAfter>     <!-- Units of 4 ns clock-cycles. -->
            <numberSamplesBefore>5</numberSamplesBefore>    <!-- Units
        of 4 ns clock-cycles. -->

        <!--  
        Factor of unit conversion for returned value of the method
        AbstractBaseRawConverter:: adcToEnergy(). For hodo, unit of hit energy is
        self-defined for hodo FADC hits.  Conversion from
        self-defined-unit/ADC to MeV/ADC for true hits is taken
        into account in HodoscopeDigitizationReadoutDriver  by
        factorGainConversion. Here, factor = 1 
        -->
        <factorUnitConversion>1</factorUnitConversion>
            <!--
                Outputs all the trigger-level hits within the readout window.
                This is not generally necessary outside of specialized
                circumstances. Note that all readout hits associated with GTP
                clusters will automatically be included if GTP clusters are
                persisted.
            -->
            <persistent>false</persistent>
        </driver>

        <!--
        HodoscopePatternDriver produces hodoscope pattern list for
        each hodoscope layer to be applied in Ecal-hodo matching
        in trigger.
        Order of pattern list: top layer 1, top layer 2, bot layer
        1, bot layer 2
          -->
        <driver name="HodoscopePatternDriver" type="org.hps.readout.hodoscope.HodoscopePatternReadoutDriver">
            <!--
                Define the LCIO collection names. The input collection should
                come from the digitization driver.
            -->
            <inputCollectionName>HodoscopeCorrectedHits</inputCollectionName>
            <outputCollectionName>HodoscopePatterns</outputCollectionName>

            <!--  Hodoscope FADC hit cut -->
            <fADCHitThreshold>1</fADCHitThreshold>
            <!--  Hodoscope tilt/cluster hit cut -->
            <hodoHitThreshold>200</hodoHitThreshold>
            <!--  Persistent time for hodoscope FADC hit in unit of ns -->
            <persistentTime>60</persistentTime>
            <!--  Time for hodoscope FADC hits earlier to
                 enter the trigger system than Ecal with unit of ns -->
            <timeEarlierThanEcal>20</timeEarlierThanEcal>

            <persistent>false</persistent>
    </driver>  

        <driver name="SinglesTrigger" type="org.hps.readout.trigger2019.SinglesTrigger2019ReadoutDriver">
            <inputCollectionNameEcal>EcalClustersGTP</inputCollectionNameEcal>

            <inputCollectionNameHodo>HodoscopePatterns</inputCollectionNameHodo>

            <!-- Units of 2 ns beam bunches. -->
            <deadTime>15</deadTime>

            <!-- Units of cluster hits. -->
            <hitCountThreshold>2</hitCountThreshold>
            <!-- Units of GeV. -->
            <clusterEnergyLowThreshold>0.200</clusterEnergyLowThreshold>
            <!-- Units of GeV. -->
            <clusterEnergyHighThreshold>3.000</clusterEnergyHighThreshold>

        <!--Minimum cluster x coordinate-->
        <clusterXMin>5</clusterXMin>

        <clusterPDEC0>1.9</clusterPDEC0>
        <clusterPDEC1>-0.1716</clusterPDEC1>
        <clusterPDEC2>0.00583</clusterPDEC2>
        <clusterPDEC3>-0.0000536</clusterPDEC3>

        <geometryMatchingRequired>true</geometryMatchingRequired>

        </driver>

        <driver name="ReadoutManagerDriver" type="org.hps.readout.ReadoutDataManager">
            <readoutWindow>200</readoutWindow>
            <outputFile>${outputFile}.slcio</outputFile>
        </driver>

        <driver name="CleanupDriver" type="org.lcsim.recon.tracking.digitization.sisim.config.ReadoutCleanupDriver" />
    </drivers>
</lcsim>

A script for running programs:

#!/bin/bash

# evio skimmed files: /cache/mss/hallb/hps/physrun2019/production/evio-skims/fcup/010666

echo "evio to lcio for pulser data"
java -Dorg.hps.conditions.url=jdbc:sqlite:/work/slac/data/conditions/hps_conditions_2020_08_17.db -cp ./hps-distribution-bin.jar org.hps.evio.EvioToLcio -l pulser.slcio -d HPS-PhysicsRun2019-v2-4pt5 -R 10666 pulser.evio

echo "overlay MC data with pulsr data"
java -jar /Users/tongtongcao/research/hps/hps-java/distribution/target/hps-distribution-4.5-SNAPSHOT-bin.jar /Users/tongtongcao/research/hps/data/2019/experiment/merged/Overlay.lcsim -d HPS-PhysicsRun2019-v2-4pt5 -R 10666 -n 1000 -e 1

echo "space overlaid events......."
/usr/bin/time java -DdisableSvtAlignmentConstants -XX:+UseSerialGC -Xmx300m -cp /Users/tongtongcao/research/hps/hps-java/distribution/target/hps-distribution-4.5-SNAPSHOT-bin.jar org.hps.util.FilterMCBunches -e 250 tritrig_and_pulser_data.slcio tritrig_and_pulser_spaced.slcio -a

echo "readout"
java -DdisableSvtAlignmentConstants -XX:+UseSerialGC -Xmx500m  -jar /Users/tongtongcao/research/hps/hps-java/distribution/target/hps-distribution-4.5-SNAPSHOT-bin.jar /Users/tongtongcao/research/hps/data/2019/experiment/merged/testSteering.lcsim -i tritrig_and_pulser_spaced.slcio -DoutputFile=readout -d HPS-PhysicsRun2019-v2-4pt5 -R 10666
mholtrop commented 2 years ago

Can this be closed? Seem Tongtong's work on merging data allow us to close this.

tongtongcao commented 2 years ago

We can close this issue. The new system for the readout system has been developed, and have been tested well. PR has been approved.