ALPHA-g-Experiment / alpha-g

MIT License
1 stars 1 forks source link

Panic in `track_finding` #136

Open DJDuque opened 8 months ago

DJDuque commented 8 months ago

Reported by Andrea:

alpha-g-vertices --output R9570.csv $AGMIDASDATA/run09570sub*.mid.lz4
[46/78] Processing, ETA: 44m
[========================>] 99%, ETA: 1s    (/daq/alpha_data0/acapra/alphag/midasdata/run09570sub046.mid.lz4)                                    thread '<unnamed>' panicked at /home/acapra/.cargo/registry/src/index.crates.io-6f17d22bba15001f/alpha_g_physics-0.1.0/src/reconstruction/track_finding.rs:186:60:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
DJDuque commented 8 months ago

I can't reproduce the bug:

for i in {1..200}; do alpha-g-vertices -o test_${i} /daq/alpha_data0/acapra/alphag/midasdata/run09570sub046.mid.lz4; done

Runs without issue all 200 times.

I'll try just the last 2% of the events in that file.

DJDuque commented 8 months ago

Ran that file in an infinite loop with:

use alpha_g_detector::midas::EventId;
use alpha_g_physics::MainEvent;
use rayon::prelude::*;

// Dummy structure to print event serial number on panic.
struct Foo {
    serial_number: u32,
}

impl Drop for Foo {
    fn drop(&mut self) {
        if std::thread::panicking() {
            println!("panicked on event {}", self.serial_number);
        }
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // The default 2 MiB stack size for threads is not enough.
    rayon::ThreadPoolBuilder::new()
        .stack_size(4 * 1024 * 1024)
        .build_global()?;

    let contents = std::fs::read("run09570sub046.mid")?;
    let file_view = midasio::FileView::try_from(&contents[..])?;
    let run_number = file_view.run_number();

    let main_events: Vec<_> = file_view
        .into_iter()
        .filter(|event| matches!(EventId::try_from(event.id()), Ok(EventId::Main)))
        .collect();

    let mut counter = 0;
    loop {
        counter += 1;
        println!("Iteration number {}", counter);

        main_events.clone().into_par_iter().for_each(|event| {
            let serial_number = event.serial_number();
            // This will get dropped at the end of scope; printing the serial
            // number if the program panics.
            let _f = Foo { serial_number };

            let banks = event
                .into_iter()
                .map(|bank| (bank.name(), bank.data_slice()));
            if let Ok(event) = MainEvent::try_from_banks(run_number, banks) {
                let _vertex = event.vertex();
            };
        });
    }
}

I was able to reproduce the bug after 400 iterations.

thread '<unnamed>' panicked at /home/djduque/.cargo/registry/src/index.crates.io-6f17d22bba15001f/alpha_g_physics-0.1.0/src/reconstruction/track_finding.rs:186:60:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
panicked on event 56507330

I will leave it running again overnight to see if I can find another event, or if it just fails in the same one.

DJDuque commented 8 months ago

I iterated through that event millions of times with:

use alpha_g_physics::MainEvent;
use rayon::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // The default 2 MiB stack size for threads is not enough.
    rayon::ThreadPoolBuilder::new()
        .stack_size(4 * 1024 * 1024)
        .build_global()?;

    let contents = std::fs::read("run09570sub046.mid")?;
    let file_view = midasio::FileView::try_from(&contents[..])?;
    let run_number = file_view.run_number();
    // The bug was observed in the last 1% of the events in this file.
    // There are 11433 events in this file.
    let [ref main_event] = file_view
        .into_iter()
        .filter(|event| event.serial_number() == 56507330)
        .collect::<Vec<_>>()[..]
    else {
        unreachable!();
    };

    rayon::iter::repeat(main_event).for_each(|event| {
        println!("{}", event.serial_number());
        let banks = event
            .into_iter()
            .map(|bank| (bank.name(), bank.data_slice()));
        if let Ok(event) = MainEvent::try_from_banks(run_number, banks) {
            let _vertex = event.vertex();
        };
    });

    Ok(())
}

I can't reproduce the bug again.

DJDuque commented 7 months ago

The reconstruction is 100% deterministic since #138. Whenever this shows up again it will hopefully be reproducible and I will fix it then.