JeffersonLab / halld_recon

Reconstruction for the GlueX Detector
7 stars 9 forks source link

ReactionFilter crash on simulated REST file #355

Closed aaust closed 4 years ago

aaust commented 4 years ago

I noticed a strange effect while investigating failed tests from the MCwrapper-bot.

I created two REST files from the same smeared file, one with variation=mc (dana_rest_mc.hddm) and one with variation=default (dana_rest_default.hddm). The first one crashes ReactionFilter, the second one does not. I am trying to figure out what's wrong with it.

The REST files and the ReactionFilter config file (jana_config_analysis.cfg) can be found here: /work/halld2/home/aaustreg/tmp/mcwrapper_test/

aaust commented 4 years ago

I was using the software versions in recon-2017_01-ver03_16.xml for these tests, but I could reproduce the same behavior on the latest master, when I start from the same smeared file.

zihlmann commented 4 years ago

I does appear that both hddm files are intact and provide useful data. So the crash with variation=mc must happen at the very end after the hddm file is closed? How many threads did you use? Does it happend when running with one thread only? I do see a similar issue when running the NPP MC every once in a while the reconstruction using the RactionFilter crashes after the last event is analyzed. But the resulting hddm file seems to be ok.

On 4/17/20 5:31 PM, Alex Austregesilo wrote:

I noticed a strange effect while investigating failed tests from the MCwrapper-bot.

I created two REST files from the same smeared file, one with variation=mc (dana_rest_mc.hddm) and one with variation=default (dana_rest.hddm). The first one crashes ReactionFilter, the second one does not. I am trying to figure out what's wrong with it.

The REST files and the ReactionFilter config file (jana_config_analysis) can be found here: /work/halld2/home/aaustreg/tmp/mcwrapper_test/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_halld-5Frecon_issues_355&d=DwMCaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=Hy7ijcc6pcMoP-QxZxtQH4-vodW_VGkrA9xiBc7InXk&m=1nmYbqUax1E6-6uHMCyb_aqpph6wIbjVXCyXbGspMVg&s=jTDbsV-uLn83vAeDbcNMrjIhfdtntNV6DeEajiijtWk&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADF7AC2LZEAW6VCW3H6CJVDRNDDDXANCNFSM4MLBWEPA&d=DwMCaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=Hy7ijcc6pcMoP-QxZxtQH4-vodW_VGkrA9xiBc7InXk&m=1nmYbqUax1E6-6uHMCyb_aqpph6wIbjVXCyXbGspMVg&s=77kW5Yr8oqMBkWHZf63QKP7IFpecIJeN6OWB41EaIWo&e=.

aaust commented 4 years ago

ReactionFilter crashes immediately, and the output root tree is empty. This happens in single threaded mode, and it is reproducable.

There seems to be a problem with the true beam photon, but I can't say anything more at the moment.

On 4/17/2020 7:06 PM, zihlmann wrote:

I does appear that both hddm files are intact and provide useful data. So the crash with variation=mc must happen at the very end after the hddm file is closed? How many threads did you use? Does it happend when running with one thread only? I do see a similar issue when running the NPP MC every once in a while the reconstruction using the RactionFilter crashes after the last event is analyzed. But the resulting hddm file seems to be ok.

On 4/17/20 5:31 PM, Alex Austregesilo wrote:

I noticed a strange effect while investigating failed tests from the MCwrapper-bot.

I created two REST files from the same smeared file, one with variation=mc (dana_rest_mc.hddm) and one with variation=default (dana_rest.hddm). The first one crashes ReactionFilter, the second one does not. I am trying to figure out what's wrong with it.

The REST files and the ReactionFilter config file (jana_config_analysis) can be found here: /work/halld2/home/aaustreg/tmp/mcwrapper_test/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_halld-5Frecon_issues_355&d=DwMCaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=Hy7ijcc6pcMoP-QxZxtQH4-vodW_VGkrA9xiBc7InXk&m=1nmYbqUax1E6-6uHMCyb_aqpph6wIbjVXCyXbGspMVg&s=jTDbsV-uLn83vAeDbcNMrjIhfdtntNV6DeEajiijtWk&e=,

or unsubscribe

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADF7AC2LZEAW6VCW3H6CJVDRNDDDXANCNFSM4MLBWEPA&d=DwMCaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=Hy7ijcc6pcMoP-QxZxtQH4-vodW_VGkrA9xiBc7InXk&m=1nmYbqUax1E6-6uHMCyb_aqpph6wIbjVXCyXbGspMVg&s=77kW5Yr8oqMBkWHZf63QKP7IFpecIJeN6OWB41EaIWo&e=.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_halld-5Frecon_issues_355-23issuecomment-2D615500734&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=9LGv0gfS3B8uAbsk8r_cEX_4GVRxd2wkj-RJy5MLidg&m=V29JMsU8GZha6Hu2WwXpaUjez2ALgEGWxlW9ls9SoXc&s=8EwhljsNY6Ux6D9OJEY0NH7o84u28SS2gg-334uapX0&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACDGFSBVMI7DNEJAHBQ6L53RNDOIRANCNFSM4MLBWEPA&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=9LGv0gfS3B8uAbsk8r_cEX_4GVRxd2wkj-RJy5MLidg&m=V29JMsU8GZha6Hu2WwXpaUjez2ALgEGWxlW9ls9SoXc&s=LBqKZTrx8lsSWy6PzC6QX0ignLJYe7fux6BgfK0BOnM&e=.

sdobbs commented 4 years ago

There is something mysterious here!

ifarm1802.jlab.org> hddm-xml /work/halld2/home/aaustreg/tmp/mcwrapper_test/dana_rest_mc.hddm | grep eventNo | head
Warning in <UnknownClass::SetDisplay>: DISPLAY not set, setting it to login1.jlab.org:0.0
  <reconstructedPhysicsEvent eventNo="0" runNo="0">
  <reconstructedPhysicsEvent eventNo="1" runNo="30279">
  <reconstructedPhysicsEvent eventNo="1" runNo="30279">
  <reconstructedPhysicsEvent eventNo="2" runNo="30279">
  <reconstructedPhysicsEvent eventNo="3" runNo="30279">
  <reconstructedPhysicsEvent eventNo="4" runNo="30279">
  <reconstructedPhysicsEvent eventNo="5" runNo="30279">
  <reconstructedPhysicsEvent eventNo="6" runNo="30279">
  <reconstructedPhysicsEvent eventNo="7" runNo="30279">
  <reconstructedPhysicsEvent eventNo="8" runNo="30279">

ifarm1802.jlab.org> 
ifarm1802.jlab.org> hddm-xml /work/halld2/home/aaustreg/tmp/mcwrapper_test/dana_rest_default.hddm | grep eventNo | head
Warning in <UnknownClass::SetDisplay>: DISPLAY not set, setting it to login1.jlab.org:0.0
  <reconstructedPhysicsEvent eventNo="0" runNo="0">
  <reconstructedPhysicsEvent eventNo="1" runNo="30279">
  <reconstructedPhysicsEvent eventNo="1" runNo="30279">
  <reconstructedPhysicsEvent eventNo="2" runNo="30279">
  <reconstructedPhysicsEvent eventNo="3" runNo="30279">
  <reconstructedPhysicsEvent eventNo="4" runNo="30279">
  <reconstructedPhysicsEvent eventNo="5" runNo="30279">
  <reconstructedPhysicsEvent eventNo="6" runNo="30279">
  <reconstructedPhysicsEvent eventNo="7" runNo="30279">
  <reconstructedPhysicsEvent eventNo="8" runNo="30279">

Two event "1"s at once! This would indeed cause misery, since some of the memory management features of the analysis library are triggered on the event number =(((

Looking at some other data simulated using AmpTools, I'm seeing the same thing.

Strictly speaking, I am not sure if this sanity check [In DSourceComboer::Reset_NewEvent()] is absolutely required. Probably it should at least print out a warning?

To fix the event generators, I think this line should be changed from 0 to 1, since event numbering in GlueX starts at 1. https://github.com/JeffersonLab/halld_sim/blob/fa6506f9745c4ba675ff9e84ab78af2923850054/src/libraries/AMPTOOLS_DATAIO/HDDMDataWriter.cc#L14

zihlmann commented 4 years ago

using hd_dump, I see two event No 1's, followed by event No 2. In both files.

sdobbs commented 4 years ago

I think these are resolved now

aaust commented 4 years ago

Sean is correct