Open DJDuque opened 8 months ago
I can't reproduce the bug:
for i in {1..200}; do alpha-g-vertices -o test_${i} /daq/alpha_data0/acapra/alphag/midasdata/run09570sub046.mid.lz4; done
Runs without issue all 200 times.
I'll try just the last 2% of the events in that file.
Ran that file in an infinite loop with:
use alpha_g_detector::midas::EventId;
use alpha_g_physics::MainEvent;
use rayon::prelude::*;
// Dummy structure to print event serial number on panic.
struct Foo {
serial_number: u32,
}
impl Drop for Foo {
fn drop(&mut self) {
if std::thread::panicking() {
println!("panicked on event {}", self.serial_number);
}
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
// The default 2 MiB stack size for threads is not enough.
rayon::ThreadPoolBuilder::new()
.stack_size(4 * 1024 * 1024)
.build_global()?;
let contents = std::fs::read("run09570sub046.mid")?;
let file_view = midasio::FileView::try_from(&contents[..])?;
let run_number = file_view.run_number();
let main_events: Vec<_> = file_view
.into_iter()
.filter(|event| matches!(EventId::try_from(event.id()), Ok(EventId::Main)))
.collect();
let mut counter = 0;
loop {
counter += 1;
println!("Iteration number {}", counter);
main_events.clone().into_par_iter().for_each(|event| {
let serial_number = event.serial_number();
// This will get dropped at the end of scope; printing the serial
// number if the program panics.
let _f = Foo { serial_number };
let banks = event
.into_iter()
.map(|bank| (bank.name(), bank.data_slice()));
if let Ok(event) = MainEvent::try_from_banks(run_number, banks) {
let _vertex = event.vertex();
};
});
}
}
I was able to reproduce the bug after 400 iterations.
thread '<unnamed>' panicked at /home/djduque/.cargo/registry/src/index.crates.io-6f17d22bba15001f/alpha_g_physics-0.1.0/src/reconstruction/track_finding.rs:186:60:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
panicked on event 56507330
I will leave it running again overnight to see if I can find another event, or if it just fails in the same one.
I iterated through that event millions of times with:
use alpha_g_physics::MainEvent;
use rayon::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// The default 2 MiB stack size for threads is not enough.
rayon::ThreadPoolBuilder::new()
.stack_size(4 * 1024 * 1024)
.build_global()?;
let contents = std::fs::read("run09570sub046.mid")?;
let file_view = midasio::FileView::try_from(&contents[..])?;
let run_number = file_view.run_number();
// The bug was observed in the last 1% of the events in this file.
// There are 11433 events in this file.
let [ref main_event] = file_view
.into_iter()
.filter(|event| event.serial_number() == 56507330)
.collect::<Vec<_>>()[..]
else {
unreachable!();
};
rayon::iter::repeat(main_event).for_each(|event| {
println!("{}", event.serial_number());
let banks = event
.into_iter()
.map(|bank| (bank.name(), bank.data_slice()));
if let Ok(event) = MainEvent::try_from_banks(run_number, banks) {
let _vertex = event.vertex();
};
});
Ok(())
}
I can't reproduce the bug again.
The reconstruction is 100% deterministic since #138. Whenever this shows up again it will hopefully be reproducible and I will fix it then.
Reported by Andrea: