Closed normangraf closed 6 years ago
This runtime exception seems always to be preceded by a warning message from WTrack, namely:
WTrack: this track started to go backwards?! params [WTrack params [NaN, NaN, NaN, 0.02215501332857945, NaN, NaN, NaN, ]
Should simply kill this track after issuing warning. Should ultimately resolve the root cause of this. Perhaps due to looser cuts on strategies picking up loopers.
Should check whether the WTrack warning ever leads to a successful fit to make sure we don't lose good tracks.
I have found an event on which this Exception is thrown. I then skimmed this event and the Exception is NOT thrown on this single event. So I skimmed a few extra events both before and after. What is most curious is that the behavior of the reconstruction depends on how many events I process prior to getting to this event! A file containing 10 events can be found online at: http://www.lcsim.org/test/hps-java/problemFiles/matrixSingular_5772_10events.evio
Here is my command (running with the latest git master snapshot):
java -cp ~/git/hps-java/distribution/target/hps-distribution-4.0-SNAPSHOT-bin.jar org.hps.evio.EvioToLcio -r -x /org/hps/steering/recon/EngineeringRun2015FullRecon.lcsim -d HPS-EngRun2015-Nominal-v6-0-fieldmap -D outputFile=tmp matrixSingular_5772_10events.evio -e 1
This command results in the "Matrix is singular" Exception being thrown on event 79267165.
If I skip two events, viz.
java -cp ~/git/hps-java/distribution/target/hps-distribution-4.0-SNAPSHOT-bin.jar org.hps.evio.EvioToLcio -r -x /org/hps/steering/recon/EngineeringRun2015FullRecon.lcsim -d HPS-EngRun2015-Nominal-v6-0-fieldmap -D outputFile=tmp matrixSingular_5772_10events.evio -e 1 -s 2
The event is processed just fine and the command runs to completion.
I have no idea what is going on.
I would appreciate it if others could download the file and see if this is reproducible.
I have modified GBLRefitterDriver and MakeGblTracks to simply skip tracks for which the refit would fail.
I have run this over the file mentioned above and it successfully processed event 79267165, viz:
[INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267047; time: 1431858526254520128; seq: 0 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267060; time: 1431858526255245688; seq: 1 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267075; time: 1431858526256018160; seq: 2 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267081; time: 1431858526256800012; seq: 3 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267098; time: 1431858526257510888; seq: 4 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267113; time: 1431858526258217696; seq: 5 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267165; time: 1431858526261200552; seq: 6 WTrack: this track started to go backwards?! params [WTrack params [NaN, NaN, NaN, 0.023117064077144367, NaN, NaN, NaN, ] with corresponding HelicalTrackFit: HelicalTrackFit: d0= 106.85803180556117 phi0= -0.8142691936809715 curvature: -0.003113146091491068 z0= 1.0396211176748624 tanLambda= -0.020735453823318976 ] Can't find track intercept; aborting Track refit [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267189; time: 1431858526262687376; seq: 7 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267220; time: 1431858526264155672; seq: 8 [INFO] [org.lcsim.job.EventPrintLoopAdapter] event: 79267297; time: 1431858526268618396; seq: 9 [INFO] [org.hps.evio] Last physics event time: 1431858526 - Sun May 17 03:28:46 PDT 2015 EventFlagFilter Summary: events processed = 10 events passed = 9 rejection = 0.9 [INFO] [org.hps.evio] Job finished successfully!
I have successfully run the EngRun2015*ReconTest integrated tests.
I am running over the 48 unblinded evio partitions from run 5772. This may take a while to complete.
Resolved and merged with pull request 244.
Current theory: the error is a result of a TrackUtils.getHelixPlaneIntercept() failure (https://github.com/JeffersonLab/hps-java/pull/244/files#diff-9a345604cc2a44f1bc20ebbd00f53ec7L250) -- more precisely, a failure in the WTrack method getHelixAndPlaneIntercept() https://github.com/JeffersonLab/hps-java/blob/master/tracking/src/main/java/org/hps/recon/tracking/WTrack.java#L269 which TrackUtils.getHelixPlaneIntercept() calls.
This TrackUtils.getHelixPlaneIntercept method is called in several places in the code. One is in MultipleScattering.java, where there is a cryptic little piece of code that skips the call under certain conditions (presumably because the method would fail): // TODO Catch special cases where the incidental iteration procedure seems to fail if (Math.abs(helix.R()) < 2000 && Math.abs(helix.dca()) > 10.0) https://github.com/JeffersonLab/hps-java/blob/master/tracking/src/main/java/org/hps/recon/tracking/MultipleScattering.java#L328 But this little piece of code doesn't exist in the other places, which is probably when the error happens.
Re-opening this issue to properly deal with the root cause. Options:
Believe this is resolved in 4.0.1 milestone.
Re-opening, with branch iss243 to actually eliminate the source of the errors rather than just catching them.
Resolved by #268.
"Matrix is singular" Runtime Exception is causing the reconstruction to abort. This exception needs to be caught and handled on an event basis, not a run partition file basis.