JeffersonLab / hps-java

HPS reconstruction and analysis framework in Java
2 stars 10 forks source link

Fix Recon getting stuck on specific events. #251

Closed mholtrop closed 6 years ago

mholtrop commented 6 years ago

When running recon, the process gets stuck on specific events. The files for which this happens from run 5772 are documented here: tpass7f

Specific event:run 5772, file 240, event number 61402770

Ouput:

2017-11-24 12:24:11 [INFO] org.lcsim.job.EventPrintLoopAdapter recordSupplied :: event: 61402766; time: 1431857544478647336; seq: 0
2017-11-24 12:24:11 [INFO] org.lcsim.job.EventPrintLoopAdapter recordSupplied :: event: 61402767; time: 1431857544478724340; seq: 1
2017-11-24 12:24:11 [INFO] org.lcsim.job.EventPrintLoopAdapter recordSupplied :: event: 61402768; time: 1431857544478732220; seq: 2
2017-11-24 12:24:11 [INFO] org.lcsim.job.EventPrintLoopAdapter recordSupplied :: event: 61402769; time: 1431857544478816276; seq: 3

At which point the process is stuck forever until the process is killed.

mholtrop commented 6 years ago

Stack trace:

Evio2Lcio [Java Application]    
    org.hps.evio.EvioToLcio at localhost:62943  
        Thread [main] (Suspended)   
            TrackUtils.extrapolateTrackUsingFieldMap(TrackState, double, double, double, FieldMap) line: 1566   
            TrackDataDriver.process(EventHeader) line: 276  
            TrackDataDriver(Driver).doProcess(EventHeader) line: 261    
            Driver.processChildren(EventHeader) line: 271   
            Driver.process(EventHeader) line: 187   
            DriverAdapter.recordSupplied(RecordEvent) line: 74  
            JobManager(JobControlManager).processEvent(EventHeader) line: 819   
            EvioToLcio.run() line: 618  
            EvioToLcio.main(String[]) line: 92  
        Daemon Thread [Abandoned connection cleanup thread] (Running)   
    /Library/Java/JavaVirtualMachines/jdk1.8.0_92.jdk/Contents/Home/bin/java (Nov 24, 2017, 12:29:48 PM)    

Code:

   while (currentPosition.x() < endPositionX) {

            // The field map coordinates are in the detector frame so the
            // extrapolated track position needs to be transformed from the
            // track frame to detector.
            Hep3Vector currentPositionDet = CoordinateTransformations.transformVectorToDetector(currentPosition);

            // Get the field at the current position along the track.
            bFieldY = fieldMap.getField(currentPositionDet).y();
            // System.out.println("Field along y (z in detector): " + bField);

            // Get a tracjectory (Helix or Line objects) created with the
            // track parameters at the current position.
            Trajectory trajectory = getTrajectory(currentMomentum, new org.lcsim.spacegeom.SpacePoint(currentPosition), q, bFieldY);

            // Using the new trajectory, extrapolated the track by a step and
            // update the extrapolated position.
            currentPosition = trajectory.getPointAtDistance(stepSize);
            // System.out.println("Current position: " + ((Hep3Vector)
            // currentPosition).toString());

            // Calculate the momentum vector at the new position. This will
            // be used when creating the trajectory that will be used to
            // extrapolate the track in the next iteration.
            currentMomentum = VecOp.mult(currentMomentum.magnitude(), trajectory.getUnitTangentAtLength(stepSize));

            // If the position of the track along X (or z in the detector frame)
            // is at 90% of the total distance, reduce the step size.
            if (currentPosition.x() / endPositionX > .80 && !stepSizeChange) {
                stepSize /= 10;
                // System.out.println("Changing step size: " + stepSize);
                stepSizeChange = true;
            }
        }
mholtrop commented 6 years ago

Some debugging shows that the issue with this loop is for tracks with very small momentum that end up going backwards, so the condition currentPosition.x() < endPositionX is always satisfied and the particle loops forever.

Adding a loop counter shows that the very often this loop executes on the order of 600 times. Setting an if to abort after 10000 was one possible option, however then we are stepping a track for sometime for no good reason.

It became clear that a test for currentMomentum.x() < 0 would be the quickest way to detect that this track was not going to get to the endPositionX.

Minimal change to the code:

  1. Add a test for currentMomentum.x() < 0. Throw exception if encountered.
  2. Catch the exception in the TrackDataDriver.process()

There may now be additional places where we need to catch this exception.

mholtrop commented 6 years ago

Unfortunately, doing this causes a new "nullpointer exception".

2017-11-24 19:31:47 [INFO] org.lcsim.job.EventPrintLoopAdapter recordSupplied :: event: 61402769; time: 1431857544478816276; seq: 55
2017-11-24 19:31:48 [WARNING] org.hps.recon.tracking.TrackUtils extrapolateTrackUsingFieldMap :: extrapolateTrackUsingFieldMap track going backwards - Killed

2017-11-24 19:31:48 [WARNING] org.hps.recon.tracking.TrackDataDriver process :: Exception in TrackDataDriver - TrackExtrapolate 

2017-11-24 19:31:48 [SEVERE] org.hps.recon.tracking.TrackDataDriver process :: extrapolateTrackUsingFieldMap track going backwards - Killed

java.lang.RuntimeException: extrapolateTrackUsingFieldMap track going backwards - Killed

    at org.hps.recon.tracking.TrackUtils.extrapolateTrackUsingFieldMap(TrackUtils.java:1598)
    at org.hps.recon.tracking.TrackDataDriver.process(TrackDataDriver.java:278)
    at org.lcsim.util.Driver.doProcess(Driver.java:261)
    at org.lcsim.util.Driver.processChildren(Driver.java:271)
    at org.lcsim.util.Driver.process(Driver.java:187)
    at org.lcsim.util.DriverAdapter.recordSupplied(DriverAdapter.java:74)
    at org.lcsim.job.JobControlManager.processEvent(JobControlManager.java:819)
    at org.hps.evio.EvioToLcio.run(EvioToLcio.java:618)
    at org.hps.evio.EvioToLcio.main(EvioToLcio.java:92)

Exception in thread "main" java.lang.NullPointerException
    at org.hps.recon.utils.TrackClusterMatcher.getNSigmaPosition(TrackClusterMatcher.java:325)
    at org.hps.recon.utils.TrackClusterMatcher.getNSigmaPosition(TrackClusterMatcher.java:306)
    at org.hps.recon.particle.ReconParticleDriver.makeReconstructedParticles(ReconParticleDriver.java:459)
    at org.hps.recon.particle.ReconParticleDriver.process(ReconParticleDriver.java:646)
    at org.hps.recon.particle.HpsReconParticleDriver.process(HpsReconParticleDriver.java:141)
    at org.lcsim.util.Driver.doProcess(Driver.java:261)
    at org.lcsim.util.Driver.processChildren(Driver.java:271)
    at org.lcsim.util.Driver.process(Driver.java:187)
    at org.lcsim.util.DriverAdapter.recordSupplied(DriverAdapter.java:74)
    at org.lcsim.job.JobControlManager.processEvent(JobControlManager.java:819)
    at org.hps.evio.EvioToLcio.run(EvioToLcio.java:618)
    at org.hps.evio.EvioToLcio.main(EvioToLcio.java:92)
mholtrop commented 6 years ago

In TrackClusterMatcher, add:

        if(trackStateAtEcal == null){
            // Track never made it to the ECAL, so it curled before doing this and probably extrapolateTrackUsingFieldMap aborted.
            return Double.MAX_VALUE;
        }

to avoid a null pointer exception if the track never intersected with the ECAL plane.

mholtrop commented 6 years ago

Tried this fix agains run 5772 files 240, 194, and 133. Each of these files now runs until completion.