JeffersonLab / hps-java

HPS reconstruction and analysis framework in Java
2 stars 10 forks source link

2019 reconstruction fails #662

Closed andrea-celentano closed 4 years ago

andrea-celentano commented 4 years ago

After a git pull and a clean recompile (commit b65e7daf442481bc4ed73b391449c28a2bb1d22a (HEAD -> master, origin/master, origin/iss204e, origin/HEAD, iss660)

I tried the command:

java -cp ~/work/hps/hps-java/distribution/target/hps-distribution-4.5-SNAPSHOT-bin.jar org.hps.evio.EvioToLcio -r -x /org/hps/steering/recon/PhysicsRun2019_NoSVT.lcsim -d HPS-PhysicsRun2019-v1-4pt5 -DoutputFile=test hps_fee_010022.evio.00100-00108

The data file is available at www.ge.infn.it/~celentan/data

The error I get is:

Exception in thread "main" java.lang.NullPointerException at org.hps.evio.AbstractSvtEvioReader.makeHit(AbstractSvtEvioReader.java:303) at org.hps.evio.Phys2019SvtEvioReader.makeHit(Phys2019SvtEvioReader.java:124) at org.hps.evio.Phys2019SvtEvioReader.makeHits(Phys2019SvtEvioReader.java:174) at org.hps.evio.AbstractSvtEvioReader.makeHits(AbstractSvtEvioReader.java:191) at org.hps.evio.LCSimEngRunEventBuilder.makeLCSimEvent(LCSimEngRunEventBuilder.java:189) at org.hps.evio.LCSimPhys2019EventBuilder.makeLCSimEvent(LCSimPhys2019EventBuilder.java:56) at org.hps.evio.EvioToLcio.run(EvioToLcio.java:608) at org.hps.evio.EvioToLcio.main(EvioToLcio.java:92)

normangraf commented 4 years ago

Please try running with the production jar found at

https://srs.slac.stanford.edu/nexus/repository/lcsim-maven2-snapshot/org/hps/hps-distribution/4.5-SNAPSHOT/hps-distribution-4.5-20200218.214859-16-bin.jar

Please let me know if this works.

pbutti commented 4 years ago

Hi guys, Not sure if related but I see tests failing on the current master. Here is a log of the test and error

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.628 sec <<< FAILURE! - in org.hps.test.it.PhysRun2019ReconTest testIt(org.hps.test.it.PhysRun2019ReconTest) Time elapsed: 20.552 sec <<< ERROR! java.lang.NullPointerException at org.hps.record.scalers.ScalerData.getScalerData(ScalerData.java:260) at org.hps.record.scalers.ScalersEvioProcessor.process(ScalersEvioProcessor.java:70) at org.hps.evio.LCSimEngRunEventBuilder.writeScalerData(LCSimEngRunEventBuilder.java:253) at org.hps.evio.LCSimEngRunEventBuilder.makeLCSimEvent(LCSimEngRunEventBuilder.java:198) at org.hps.evio.LCSimPhys2019EventBuilder.makeLCSimEvent(LCSimPhys2019EventBuilder.java:56) at org.hps.evio.EvioToLcio.run(EvioToLcio.java:608) at org.hps.evio.EvioToLcio.main(EvioToLcio.java:92) at org.hps.test.it.PhysRun2019ReconTest.testIt(PhysRun2019ReconTest.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:367) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:274) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:161) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:290) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:242) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)

andrea-celentano commented 4 years ago

Hi, I tried the jar file, still I get:

Exception in thread "main" java.lang.NullPointerException at org.hps.evio.AbstractSvtEvioReader.makeHit(AbstractSvtEvioReader.java:303) at org.hps.evio.Phys2019SvtEvioReader.makeHit(Phys2019SvtEvioReader.java:124) at org.hps.evio.Phys2019SvtEvioReader.makeHits(Phys2019SvtEvioReader.java:174) at org.hps.evio.AbstractSvtEvioReader.makeHits(AbstractSvtEvioReader.java:191) at org.hps.evio.LCSimEngRunEventBuilder.makeLCSimEvent(LCSimEngRunEventBuilder.java:189) at org.hps.evio.LCSimPhys2019EventBuilder.makeLCSimEvent(LCSimPhys2019EventBuilder.java:56) at org.hps.evio.EvioToLcio.run(EvioToLcio.java:608) at org.hps.evio.EvioToLcio.main(EvioToLcio.java:92)

normangraf commented 4 years ago

Hmmmm, bizarre. Not sure what's going on here...

mholtrop commented 4 years ago

I just ran your command on the current master on run 10030, not an FEE filter version, and it seems hps-java is running fine. Is this crash happening on the first event, or on a specific later event?

normangraf commented 4 years ago

Thanks for checking on this Maurik. I've not experienced any difficulties on any 2019 runs, which is why I am mystified. But I hate to just say "works for me."

mholtrop commented 4 years ago

Hello Andrea, Norman, I loaded the file you made available and see the same problem. After a little bit the code crashes. The issue happens at event 22280116. I split off 10 events into an evio file (one event before, event 22280116 and 8 more) and ran this with the noSVT steering file: 2020-02-28 09:45:11 [INFO] org.lcsim.job.EventMarkerDriver process :: Event 22280033 with sequence 0 2020-02-28 09:45:11 [INFO] org.lcsim.job.EventMarkerDriver process :: Event 22280116 with sequence 1 Exception in thread "main" java.lang.NullPointerException at org.hps.evio.AbstractSvtEvioReader.makeHit(AbstractSvtEvioReader.java:303) at org.hps.evio.Phys2019SvtEvioReader.makeHit(Phys2019SvtEvioReader.java:124) at org.hps.evio.Phys2019SvtEvioReader.makeHits(Phys2019SvtEvioReader.java:174) at org.hps.evio.AbstractSvtEvioReader.makeHits(AbstractSvtEvioReader.java:191) at org.hps.evio.LCSimEngRunEventBuilder.makeLCSimEvent(LCSimEngRunEventBuilder.java:189) at org.hps.evio.LCSimPhys2019EventBuilder.makeLCSimEvent(LCSimPhys2019EventBuilder.java:56) at org.hps.evio.EvioToLcio.run(EvioToLcio.java:608) at org.hps.evio.EvioToLcio.main(EvioToLcio.java:92)

So there does seem to be a bug in the AbstractSvtEvioReader, though this must be something rare, since we ran many events and have not seen this.

I put the filtered file at: http://www.nuclear.unh.edu/HPS/Data/hps_fee_010022_event_22280116.evio

I will try to see if the debugger tells me anything.

mholtrop commented 4 years ago

Debugging notes.

The error seems to happen deep inside the SVT code: AbstractSvtEvio.java(data, channel), line 300 calls getSensor(data) where data contains:

data (id=201)
[0] 342627304
[1] 362288644
[2] 333714272
[3] 34802177
channel 19

Making debugging more difficult, each time you call this, the "id" for the data sample will have a different sequence. Some lcsim stuff I do not comprehend the reason for. The error occurs on the 78th hit out to 85+1 (starting count at 0).

SvtEvioUtils.java decodes the offending hit as: channel=19, apv=4, physicalChannel=19 getFebIDFromMultiSample(data) returns 10 getFebHybridIDFromMultisample returns 0 so daqPair ([10,0]) is passed to daqPairToSensor.get() which returns Null.

That Null is put in "sensor", so a the call sensor. fails.

Conclusion:

Either:

  1. My debugging is bogus, and I don't understand lcsim stuff at all :heavy_check_mark:
  2. There is something messed up with this particular data sample. This is rare, so we just put a guard (check for Null) around it, and print a message when this happens.
  3. The DAQ Map is messed up, and 10,0 is a legitimate ID.

Which of 2 or 3?