SuperNEMO-DBD / Falaise

Simulation, Reconstruction and Analysis Software for the SuperNEMO Experiment
http://supernemo.org/Falaise
GNU General Public License v3.0
5 stars 27 forks source link

official-2.0.0.conf - gamma clustering crashes with Falaise 3.2 #99

Closed cherylepatrick closed 2 years ago

cherylepatrick commented 6 years ago

I'm trying the official 2.0.0 pipeline config with a fresh falaise 3.2 install, but the reconstruction crashes. Error indicates it's from the gamma clustering (and taking this module out of the config works). Crash doesn't seem to be exactly replicable from run to run - I can re-run on the same file and it might get further or less far through processing the file. We've run gamma clustering on bigger files than this before so it is probably just a config issue. Will investigate but am logging here for now in case anyone has a smart idea.

Error below:

 *** Break *** segmentation violation
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/Users/cpatrick/CadfaelBrew/opt/bayeux/lib/libBayeux.3.dylib] geomtools::geom_id::operator==(geomtools::geom_id const&) const (no debug info)
[/Users/cpatrick/CadfaelBrew/Cellar/falaise/3.2.0/lib/libFalaise.3.2.0.dylib] snemo::geometry::calo_locator::is_calo_block_in_current_module(geomtools::geom_id const&) const (no debug info)
[/Users/cpatrick/CadfaelBrew/Cellar/falaise/3.2.0/lib/Falaise/modules/libFalaise_GammaClustering.dylib] snemo::reconstruction::gamma_clustering_driver::_are_on_same_wall(snemo::datamodel::calibrated_calorimeter_hit const&, snemo::datamodel::calibrated_calorimeter_hit const&) const (no debug info)
[/Users/cpatrick/CadfaelBrew/Cellar/falaise/3.2.0/lib/Falaise/modules/libFalaise_GammaClustering.dylib] snemo::reconstruction::gamma_clustering_driver::_get_tof_association(std::__1::vector<std::__1::map<double, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const, std::__1::less<double>, std::__1::allocator<std::__1::pair<double const, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const> > >, std::__1::allocator<std::__1::map<double, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const, std::__1::less<double>, std::__1::allocator<std::__1::pair<double const, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const> > > > > const&, std::__1::vector<std::__1::map<double, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const, std::__1::less<double>, std::__1::allocator<std::__1::pair<double const, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const> > >, std::__1::allocator<std::__1::map<double, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const, std::__1::less<double>, std::__1::allocator<std::__1::pair<double const, datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> const> > > > >&) const (no debug info)
[/Users/cpatrick/CadfaelBrew/Cellar/falaise/3.2.0/lib/Falaise/modules/libFalaise_GammaClustering.dylib] snemo::reconstruction::gamma_clustering_driver::_process_algo(std::__1::vector<datatools::handle<snemo::datamodel::calibrated_calorimeter_hit>, std::__1::allocator<datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> > > const&, snemo::datamodel::particle_track_data&) (no debug info)
[/Users/cpatrick/CadfaelBrew/Cellar/falaise/3.2.0/lib/libFalaise.3.2.0.dylib] snemo::processing::base_gamma_builder::process(std::__1::vector<datatools::handle<snemo::datamodel::calibrated_calorimeter_hit>, std::__1::allocator<datatools::handle<snemo::datamodel::calibrated_calorimeter_hit> > > const&, snemo::datamodel::particle_track_data&) (no debug info)
[/Users/cpatrick/CadfaelBrew/Cellar/falaise/3.2.0/lib/Falaise/modules/libFalaise_GammaClustering.dylib] snemo::reconstruction::gamma_clustering_module::_process(datatools::things&) (no debug info)
[/Users/cpatrick/CadfaelBrew/Cellar/falaise/3.2.0/lib/Falaise/modules/libFalaise_GammaClustering.dylib] snemo::reconstruction::gamma_clustering_module::process(datatools::things&) (no debug info)
[/Users/cpatrick/CadfaelBrew/opt/bayeux/lib/libBayeux.3.dylib] dpp::chain_module::process(datatools::things&) (no debug info)
[/Users/cpatrick/CadfaelBrew/bin/flreconstruct] FLReconstruct::do_pipeline(FLReconstruct::FLReconstructParams const&) (no debug info)
[/Users/cpatrick/CadfaelBrew/bin/flreconstruct] FLReconstruct::do_flreconstruct(int, char**) (no debug info)
[/Users/cpatrick/CadfaelBrew/bin/flreconstruct] main (no debug info)
[/usr/lib/system/libdyld.dylib] start (no debug info)
[<unknown binary>] (no debug info)
cherylepatrick commented 6 years ago

Mostly making notes before I ignore this for the weekend - I tried Lauren's bare-bones reconstruct pipeline on the same input file, and it ran ok and reconstructed gammas. (But it was a beta beta file so not a lot of gammas to reconstruct). Tried it again on a sample with more gammas and it segfaulted. Ran it AGAIN on that same file and it didn't segfault but it only reconstructed 86 of 999 events. More confused than ever. Will look at it next week.

cherylepatrick commented 6 years ago

Ooh just noticed this:

[fatal:virtual base_module::process_status dpp::chain_module::process(::datatools::things &):185] Module 'GammaClusterizer' failed to process event record; message is '[virtual double snemo::reconstruction::gamma_clustering_driver::_get_tof_probability(const snemo::datamodel::calibrated_calorimeter_hit &, const snemo::datamodel::calibrated_calorimeter_hit &) const:512: Current geom id '[195823604:3850979413.440.2428722432]' does not match any scintillator block !]'
drbenmorgan commented 6 years ago

Hi @cherylepatrick, I haven't had time to look at this in detail, but have been able to partially reproduce the error in the gamma clustering test. This is on the current develop branch, but gamma clustering hasn't been touched since 3.2.0, so should be reproducible, albeit it still seems to happen randomly.

All I did was, using the tip of the Falaise develop branch:

  1. Create build directory and build Falaise in there with FALAISE_ENABLE_TESTING set to on.
  2. Run the test directly from the build directory, e.g.

    $ ./BuildProducts/bin/fltests/modules/falaisegammaclusteringplugin-test_gamma_clustering_driver 
    ...

Sometimes it'll succeed, with output (untrimmed for clarity):

[debug:virtual int snemo::processing::base_gamma_builder::_post_process(const base_gamma_builder::hit_collection_type &, snemo::datamodel::particle_track_data &):337] No charged particles have been found !
|-- Particle(s) : 3
|   |-- Particle #0 : 
|   |   |-- Store       : (1)
|   |   |-- Hit ID      : 1
|   |   |-- Geometry ID : <none>
|   |   |-- Auxiliaries : <empty>
|   |   |-- Track ID : 1
|   |   |-- Trajectory : <No>
|   |   |-- Particle charge : neutral
|   |   |-- Vertices : 2
|   |   |   |-- Vertex Id=-1 @ (496,-2201.5,-518) mm (calo)
|   |   |   `-- Vertex Id=-1 @ (496,-2201.5,-259) mm (calo)
|   |   `-- Associated calorimeter hit(s) : 2
|   |       |-- Hit Id=-1 @ [1302:0.1.1.4.*]
|   |       `-- Hit Id=-1 @ [1302:0.1.1.5.*]
|   |-- Particle #1 : 
|   |   |-- Store       : (1)
|   |   |-- Hit ID      : 2
|   |   |-- Geometry ID : <none>
|   |   |-- Auxiliaries : <empty>
|   |   |-- Track ID : 2
|   |   |-- Trajectory : <No>
|   |   |-- Particle charge : neutral
|   |   |-- Vertices : 1
|   |   |   `-- Vertex Id=-1 @ (-496,-2201.5,-259) mm (calo)
|   |   `-- Associated calorimeter hit(s) : 1
|   |       `-- Hit Id=-1 @ [1302:0.0.1.5.*]
|   `-- Particle #2 : 
|       |-- Store       : (1)
|       |-- Hit ID      : 3
|       |-- Geometry ID : <none>
|       |-- Auxiliaries : <empty>
|       |-- Track ID : 3
|       |-- Trajectory : <No>
|       |-- Particle charge : neutral
|       |-- Vertices : 1
|       |   `-- Vertex Id=-1 @ (-496,-1424.5,518) mm (calo)
|       `-- Associated calorimeter hit(s) : 1
|           `-- Hit Id=-1 @ [1302:0.0.4.8.*]
|-- Unassociated calorimeter(s) : 0
`-- Auxiliaries : <empty>
    `-- <no property>

If I keep running it, it eventually segfaults with:

[debug:virtual int snemo::processing::base_gamma_builder::_prepare_process(const base_gamma_builder::hit_collection_type &, snemo::datamodel::particle_track_data &):312] Number of calorimeter hits used: 4 (0 skipped)

... cling errors which are unrelated ...

 *** Break *** segmentation violation
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[/usr/lib/system/libsystem_kernel.dylib] mach_vm_map (no debug info)
... long backtrace ...
[/usr/lib/system/libdyld.dylib] start (no debug info)

To track the issue further, I ran in (assuming Mac, but gdb on linux will also work) lldb, just using run until the segafult occurs:

$ lldb ./BuildProducts/bin/fltests/modules/falaisegammaclusteringplugin-test_gamma_clustering_driver
(lldb) run
Process 55634 launched: './BuildProducts/bin/fltests/modules/falaisegammaclusteringplugin-test_gamma_clustering_driver' (x86_64)
Test program for the 'gamma_clustering_driver' class.
Use GC driver with default configuration
[debug:void emfield::geom_map::_construct(const datatools::properties &):95] Geometry volume/EM Field map: 
[debug]: |-- Name : 'associations.labels'
[debug]: |   |-- Description  : 'The list of labelled associations between some logical volumes and magnetic field objects'
[debug]: |   |-- Type  : string[1] (vector)
[debug]: |   `-- Value[0] : 'module'
[debug]: |-- Name : 'associations.module.field_name'
[debug]: |   |-- Description  : 'The magnetic field associated for the label "module"'
[debug]: |   |-- Type  : string (scalar)
[debug]: |   `-- Value : 'Bz_25gauss'
[debug]: |-- Name : 'associations.module.volume'
[debug]: |   |-- Description  : 'The logical model associated for the label "module"'
[debug]: |   |-- Type  : string (scalar)
[debug]: |   `-- Value : 'module_basic.model.log'
[debug]: `-- Name : 'debug'
[debug]:     |-- Description  : 'Debug flag of the geometry volume/field associations map :'
[debug]:     |-- Type  : boolean (scalar)
[debug]:     `-- Value : 1
[debug:void emfield::geom_map::_construct(const datatools::properties &):103] Number of geometry volume/field associations : 1
[debug:void emfield::geom_map::_construct(const datatools::properties &):107] Processing geometry volume/field association labelled 'module'...
[notice:void emfield::geom_map::_construct(const datatools::properties &):139] Add the EM association entry 'module' with EM field 'Bz_25gauss' associated to the logical volume 'module_basic.model.log'.
[debug:void snemo::processing::base_gamma_builder::_initialize(const datatools::properties &):91] Find locator plugin with name = locators_driver
[debug:virtual int snemo::processing::base_gamma_builder::_prepare_process(const base_gamma_builder::hit_collection_type &, snemo::datamodel::particle_track_data &):312] Number of calorimeter hits used: 4 (0 skipped)
Process 55634 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x117aa0000)
    frame #0: 0x0000000101296e65 libBayeux.3.dylib`geomtools::operator<<(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, geomtools::geom_id const&) + 120
libBayeux.3.dylib`geomtools::operator<<:
->  0x101296e65 <+120>: movl   (%rax,%r15,4), %esi
    0x101296e69 <+124>: cmpl   $-0x2, %esi
    0x101296e6c <+127>: je     0x101296e84               ; <+151>
    0x101296e6e <+129>: cmpl   $-0x1, %esi
Target 0: (falaisegammaclusteringplugin-test_gamma_clustering_driver) stopped.
(lldb) 

Now use backtrace to see where the issue occurred:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x117aa0000)
  * frame #0: 0x0000000101296e65 libBayeux.3.dylib`geomtools::operator<<(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, geomtools::geom_id const&) + 120
    frame #1: 0x00000001000c88c4 libFalaise_GammaClustering.dylib`snemo::reconstruction::gamma_clustering_driver::_get_tof_probability(this=<unavailable>, head_end_calo_hit_=0x0000000107608fa0, tail_begin_calo_hit_=<unavailable>) const at gamma_clustering_driver.cc:511 [opt]
    frame #2: 0x00000001000c7a04 libFalaise_GammaClustering.dylib`snemo::reconstruction::gamma_clustering_driver::_get_tof_association(this=0x00007ffeefbfe690, the_reconstructed_clusters=size=1, the_reconstructed_gammas=size=1) const at gamma_clustering_driver.cc:377 [opt]
    frame #3: 0x00000001000c3bf3 libFalaise_GammaClustering.dylib`snemo::reconstruction::gamma_clustering_driver::_process_algo(this=<unavailable>, calo_hits_=size=1, ptd_=0x00007ffeefbfe578) at gamma_clustering_driver.cc:159 [opt]
    frame #4: 0x00000001002a46ce libFalaise.3.dylib`snemo::processing::base_gamma_builder::process(this=0x00007ffeefbfe690, calo_hits_=size=1, ptd_=0x00007ffeefbfe578) at base_gamma_builder.cc:267 [opt]
    frame #5: 0x000000010000d45d falaisegammaclusteringplugin-test_gamma_clustering_driver`main at test_gamma_clustering_driver.cxx:97 [opt]
    frame #6: 0x00007fff769ee015 libdyld.dylib`start + 1
    frame #7: 0x00007fff769ee015 libdyld.dylib`start + 1

So the error's occurring, randomly, at line 511 of gamma_clustering_driver.cc, which is this block:

https://github.com/SuperNEMO-DBD/Falaise/blob/0cb5bdc95ea7fe4d50b32092e7f2c8ba53ac5914/modules/GammaClustering/source/falaise/snemo/reconstruction/gamma_clustering_driver.cc#L502-L513

The bt would seem to indicate some issue with the tail_gid object, but I can't yet see what that is, not where the random behaviour comes from.

cherylepatrick commented 6 years ago

Running with a config file from @Tedjditi I still see problems with that same bit of code, but this time it reports it properly as it crashes. Geom ID here is clearly garbage:

[fatal:virtual base_module::process_status dpp::chain_module::process(::datatools::things &):185] Module 'GammaClustering' failed to process event record; message is '[virtual double snemo::reconstruction::gamma_clustering_driver::_get_tof_probability(const snemo::datamodel::calibrated_calorimeter_hit &, const snemo::datamodel::calibrated_calorimeter_hit &) const:512: Current geom id '[40533748:3850979413.440.2428722432]' does not match any scintillator block !]'

(Those numbers should be something like 1302:0.0.0]). I got one where the list was insane - from running on the same file as above, with the same config:

[fatal:virtual base_module::process_status dpp::chain_module::process(::datatools::things &):185] Module 'GammaClustering' failed to process event record; message is '[virtual double snemo::reconstruction::gamma_clustering_driver::_get_tof_probability(const snemo::datamodel::calibrated_calorimeter_hit &, const snemo::datamodel::calibrated_calorimeter_hit &) const:512: Current geom id '[88958198:3850979413.2303217747.284935675.233322563.2231352149.3309073600.1480790267.3142909928.1958774015.284935470.4058538051.2231352148.3307238592.1615007995.3142902760.1958774015.3280160786.3750316096.147096392.300506459.822088682.3296938176.3277675272.3850979413.1213421121.2370370441.3051896955.1224741865.190.4160749568.1938377855.2072856664.1416357984.2370371515.1659398267.1291828052.1211659145.1096540041.3723058526.2415924101.3850979413.2303217747.2072856827.1413474400.2202599355.2303224003.3296938207.3915209480.4290466861.3850979413.1078431048.1213580125.2370364809.3277668423.3850979413.1192360901.301712704.4224010319.1565546257.1213567171.4173718921.4173661712.1212172049.1209026187.1565542281.1213567171.4224050569.1563969296.1213567171.4224050569.1563969297.1213567171.4224050569.1566066448.1213567171.4224050569.1566066449.121 .... pages of more numbers ... .4294966912.924448744.3180152832.4294967040.924499432.2106279936.129165520]' does not match any scintillator block !]'

I think maybe I need to try and get the debugger working. This is on Mac - I've asked Hamzah to take a look at how it goes on linux (we have a Linux shared setup here at UCL). Just noting this here so I don't forget.

drbenmorgan commented 5 years ago

Going to assign this to @cherylepatrick and @Tedjditi to solve. Whilst it appears Mac specific, that's indicative of some underlying issue in the logic or whatever randomisation is used.

lemiere commented 4 years ago

is there fresh status about it ?

cherylepatrick commented 4 years ago

The person to ask is @Tedjditi ! Any news, Hichem?