AdaptiveComputationLab / simcov

Other
5 stars 10 forks source link

Bus error #10

Closed abbypribis closed 3 years ago

abbypribis commented 3 years ago

I encountered this error on Xena at CARC in the slurm output file.

My run still completed, but I did not get the typical output message about how long it took and such.

[12/01/20 09:54:13 696.84s]: 39494 337944 451219 64110 19994688 1832 14437 1.74e-03 1.83e+01 < 0.699 0.789 > [12/01/20 10:05:46 693.64s]: 40300 343744 457149 67264 20772526 1748 14488 1.77e-03 1.87e+01 < 0.711 0.805 > [12/01/20 10:06:02 15.73s]: 40319 343835 457395 67065 20791232 1730 14476 1.77e-03 1.87e+01 < 0.712 0.806 > /var/spool/slurm/d/job08214/slurm_script: line 12: 9606 Bus error upcxx-run -n $SLURM_NTASKS -N $SLURM_NNODES -- ./install/bin/simcov --config configs/ode_run_1_foi_2_inf_coords.config

Might be connected to #7?

stevenhofmeyr commented 3 years ago

This is usually a memory issue. If you're running on Xena, those nodes have very little memory, and they probably don't swap so that's why it crashed. You'll need to run multinode or a smaller simulation.

abbypribis commented 3 years ago

Ok, I will run smaller simulations until I am able to run multi-node.