Open colin-daniels opened 6 years ago
Actually fails harder if you let it run it seems g10-fail2.txt (exact same inputs as before).
Edit: This time it at least properly kills the other threads, for the first crash it doesn't (or I killed them before it did).
There is something very seriously not right here. Looking into it.
The initial value of the potential seems to vary wildly in from numbers like +4000 eV
to as much as +16000 eV
, even when I am running single process, single-threaded. This ought to be impossible; when run in serial, the code is supposed to be 100% deterministic up to the point where it first performs (Correction: These observations were with eigsh
.OMP_NUM_THREADS=4
)
Another weird output that might be a separate problem. Apparently my symmetry code can be inconsistent.
$ RUST_LOG=rsp2_minimize=trace cargo run --bin=rsp2 initial.structure -c settings.yaml -o out --force
Finished dev [unoptimized + debuginfo] target(s) in 0.16s
Running `/home/lampam/cpp/other/rust/rsp2/target/debug/rsp2 initial.structure -c settings.yaml -o out --force`
[ 0.283s][INFO] Available resources for parallelism:
[ 0.283s][INFO] MPI: 1 process(es)
[ 0.283s][INFO] OpenMP: 4 thread(s) per process (OMP_NUM_THREADS)
[ 0.283s][INFO] : 1 thread(s) in single-process tasks (RSP2_MAX_THREADS)
[ 0.283s][INFO] rayon: 1 thread(s) on the root process
[ 0.292s][WARN] 'lammps-update-style: fast' is experimental (this message will not be shown again)
[ 0.351s][TRACE] bond graph: intermediate supercell: [1, 1, 1], r = 1.70017
[ 0.351s][TRACE] bond graph: true supercell: centered_diagonal([1, 1, 1])
[ 16.026s][TRACE] Writing 'out/initial.structure'
[ 16.102s][TRACE] ============================
[ 16.103s][TRACE] Begin relaxation # 1
[ 16.443s][TRACE] i: 0 v: 16807.78272218822531 dv: +0.00e0 g: 5.4438451e2 00000011111111111122 192
[ 16.444s][DEBUG] Using steepest descent. (i: 1)
[ 16.926s][TRACE] i: 1 v: 16597.86604968855681 dv: -2.10e2 g: 3.4076611e2 00001111111111111112 96
[ 17.295s][TRACE] i: 2 v: 16368.45912211309769 dv: -2.29e2 g: 3.7902783e2 cos: +0.53 01111111111111122222 672
[ 17.737s][DEBUG] update_interval: Exit by strange guess (U0), (5.304989328257774e0, 7.439512162874651e0) vs 4.455803318164412e0
[ 17.973s][DEBUG] update_interval: Exit by strange guess (U0), (5.410382863194543e0, 5.817743649194453e0) vs 4.839036044166654e0
[ 18.097s][DEBUG] update_interval: Exit by strange guess (U0), (5.410382863194543e0, 5.6064040856796415e0) vs 1.161193949085568e0
[ 18.211s][DEBUG] update_interval: Exit by strange guess (U0), (5.508393474437092e0, 5.5998271153573045e0) vs 1.2181174603992762e0
[ 18.331s][DEBUG] update_interval: Exit by strange guess (U0), (5.508393474437092e0, 5.551078957093247e0) vs 1.8723516841905392e0
[ 18.379s][TRACE] i: 3 v: 15385.62764117925690 dv: -9.83e2 g: 1.2897596e3 cos: +0.79 +0.36 01111111111111222222 768
[ 18.542s][DEBUG] update_interval: Exit by strange guess (U0), (0e0, 7.902233502103718e-2) vs -5.014386492772605e0
[ 18.666s][DEBUG] update_interval: Exit by strange guess (U0), (0e0, 3.130444235201886e-2) vs -4.21061536510545e0
[ 18.786s][DEBUG] update_interval: Exit by strange guess (U0), (0e0, 1.241620158541894e-2) vs -3.7832530834179163e0
[ 18.949s][TRACE] i: 4 v: 15385.05297506718125 dv: -5.75e-1 g: 1.1538031e3 cos: +1.00 +0.79 +0.36 01111111111111222222 864
[ 19.511s][TRACE] i: 5 v: 14620.02690701393658 dv: -7.65e2 g: 7.8813361e2 cos: -0.09 -0.09 +0.06 00111111111111111222 384
[ 20.207s][DEBUG] update_interval: Exit by strange guess (U0), (1.4459285369837874e1, 1.9738699100999074e1) vs 1.1491684230228522e1
[ 20.326s][TRACE] i: 6 v: 4715.66708952707540 dv: -9.90e3 g: 2.1005543e3 cos: +0.55 +0.22 +0.22 00001111111111112222 480
[ 21.020s][TRACE] i: 7 v: 3303.16106153555302 dv: -1.41e3 g: 3.4190172e3 cos: +1.00 +0.55 +0.22 00001111111111112222 576
[ 21.690s][DEBUG] update_interval: Exit by strange guess (U0), (1.3718229976016578e0, 1.6326237921249265e0) vs 1.6928890566512633e0
[ 21.773s][TRACE] i: 8 v: 3138.65584232136280 dv: -1.65e2 g: 4.1522559e3 cos: +0.99 +0.99 +0.52 00000001111111112222 480
[ 22.861s][TRACE] i: 9 v: -1734.26338160659816 dv: -4.87e3 g: 2.4708546e3 cos: +0.92 +0.86 +0.85 00000000000111111122 192
[ 23.379s][TRACE] i: 10 v: -1862.24086489048318 dv: -1.28e2 g: 8.8354976e2 cos: +0.99 +0.89 +0.82 00000000111111111122 192
[ 24.045s][TRACE] i: 11 v: -2478.61552236364923 dv: -6.16e2 g: 1.5864535e2 cos: -0.11 -0.19 -0.29 00111111111111222222 864
[ 24.637s][DEBUG] update_interval: Exit by strange guess (U0), (1.5421424192197553e0, 1.6326237921249265e0) vs 1.6734195586776628e0
[ 24.640s][TRACE] i: 12 v: -2950.43877366409652 dv: -4.72e2 g: 3.0028254e2 cos: +0.17 +0.12 +0.08 00000001111111111122 192
[ 24.955s][DEBUG] update_interval: Exit by strange guess (U0), (0e0, 9.030988105274831e-2) vs -6.2074628941942915e0
[ 25.038s][TRACE] i: 13 v: -2953.14043478378971 dv: -2.70e0 g: 3.0963038e2 cos: +0.99 +0.15 +0.13 00000011111111112222 480
[ 25.858s][TRACE] i: 14 v: -3204.87836643366836 dv: -2.52e2 g: 1.2905360e2 cos: +0.72 +0.62 -0.02 00001111111111112222 480
[ 26.448s][TRACE] i: 15 v: -3283.03041176446686 dv: -7.82e1 g: 9.6511716e1 cos: -0.14 -0.64 -0.69 00111111111111112222 576
[ 27.191s][TRACE] i: 16 v: -3407.94390360426314 dv: -1.25e2 g: 1.1059690e2 cos: +0.55 +0.49 -0.23 00000001111111111122 288
[ 27.849s][TRACE] i: 17 v: -3476.73570961014138 dv: -6.88e1 g: 1.4579908e2 cos: +0.81 +0.22 +0.49 00000000000011111122 288
[ 28.518s][TRACE] i: 18 v: -3582.07776354725229 dv: -1.05e2 g: 9.2406217e1 cos: +0.69 +0.18 -0.14 00001111111111112222 480
[ 29.106s][TRACE] i: 19 v: -3618.42041234991757 dv: -3.63e1 g: 3.2919930e1 cos: +0.67 +0.47 +0.19 00000000001111112222 576
[ 29.890s][TRACE] i: 20 v: -3648.20756526457535 dv: -2.98e1 g: 4.6643024e1 cos: +0.59 +0.56 +0.89 11111111111111111122 288
[ 30.637s][TRACE] i: 21 v: -3667.12670506857512 dv: -1.89e1 g: 2.3726226e1 cos: +0.94 +0.50 +0.67 00111111111111112222 576
[ 31.070s][TRACE] i: 22 v: -3669.08878420527299 dv: -1.96e0 g: 6.2980158e0 cos: +0.79 +0.73 +0.19 00000000011111111122 288
[ 31.426s][TRACE] i: 23 v: -3669.40034712308443 dv: -3.12e-1 g: 1.8988176e0 cos: +0.11 -0.06 -0.02 00111111111111112222 576
[ 31.788s][TRACE] i: 24 v: -3669.41250008772295 dv: -1.22e-2 g: 7.4869188e-1 cos: +0.29 -0.00 +0.17 00111111111111111122 192
[ 32.144s][TRACE] i: 25 v: -3669.41955771251378 dv: -7.06e-3 g: 1.0505491e0 cos: +0.42 +0.11 +0.37 00000011111111111122 288
[ 32.656s][TRACE] i: 26 v: -3669.44137342401700 dv: -2.18e-2 g: 5.2767463e-1 cos: +0.83 +0.29 +0.13 00000011111111112222 576
[ 33.017s][TRACE] i: 27 v: -3669.44241150852667 dv: -1.04e-3 g: 2.5522894e-1 cos: +0.67 +0.55 +0.75 00001111111111112222 480
[ 33.369s][TRACE] i: 28 v: -3669.44279366960882 dv: -3.82e-4 g: 4.2268518e-2 cos: +0.59 +0.38 +0.46 00000001111111112222 480
[ 33.731s][TRACE] i: 29 v: -3669.44279965109536 dv: -5.98e-6 g: 1.3910208e-2 cos: -0.06 -0.35 +0.28 00000001111111111122 288
[ 34.007s][TRACE] i: 30 v: -3669.44280100071956 dv: -1.35e-6 g: 1.3180855e-2 cos: +0.32 -0.02 +0.56 00000001111111112222 480
[ 34.530s][TRACE] i: 31 v: -3669.44280573254218 dv: -4.73e-6 g: 1.3381393e-2 cos: +0.71 +0.22 +0.35 00000000001111111122 192
[ 34.885s][TRACE] i: 32 v: -3669.44280745922151 dv: -1.73e-6 g: 6.2512621e-3 cos: +0.82 +0.58 +0.04 00000000001111111122 288
[ 35.163s][TRACE] i: 33 v: -3669.44280757288016 dv: -1.14e-7 g: 4.8384000e-4 cos: +0.63 +0.52 +0.19 00000011111111111122 192
[ 35.447s][TRACE] i: 34 v: -3669.44280757383513 dv: -9.55e-10 g: 7.7220042e-5 cos: +0.10 +0.06 -0.17 00000011111111112222 480
[ 35.805s][TRACE] i: 35 v: -3669.44280757389106 dv: -5.59e-11 g: 8.9262403e-5 cos: +0.16 +0.02 -0.48 00000001111111112222 576
[ 36.242s][TRACE] i: 36 v: -3669.44280757394336 dv: -5.23e-11 g: 6.3562784e-5 cos: +0.76 +0.12 -0.57 00000000111111112222 576
[ 36.598s][TRACE] i: 37 v: -3669.44280757398565 dv: -4.23e-11 g: 1.1559662e-4 cos: +0.96 +0.67 +0.06 00000000111111111122 288
[ 37.114s][TRACE] i: 38 v: -3669.44280757408706 dv: -1.01e-10 g: 4.0457698e-5 cos: +0.92 +0.80 +0.67 00111111111111112222 576
[ 37.398s][TRACE] i: 39 v: -3669.44280757410115 dv: -1.41e-11 g: 2.7568275e-5 cos: +0.96 +0.79 +0.61 00000000000011111122 288
[ 37.827s][TRACE] i: 40 v: -3669.44280757410115 dv: +0.00e0 g: 3.4712491e-6 cos: +0.73 +0.69 +0.72 00000000111111111122 192
[ 38.140s][TRACE] i: 41 v: -3669.44280757410070 dv: +4.55e-13 g: 5.5512018e-7 cos: +0.18 +0.13 +0.22 00000001111111111122 288
[ 38.618s][TRACE] i: 42 v: -3669.44280757410161 dv: -9.09e-13 g: 3.0598803e-7 cos: +0.16 +0.03 +0.62 00000011111111112222 480
[ 39.014s][TRACE] i: 43 v: -3669.44280757410161 dv: +0.00e0 g: 3.4078309e-7 cos: +0.49 +0.08 +0.77 00011111111111122222 672
[ 39.446s][TRACE] i: 44 v: -3669.44280757410070 dv: +9.09e-13 g: 2.5830104e-7 cos: +0.79 +0.38 +0.36 00000000111111111222 384
[ 39.958s][TRACE] i: 45 v: -3669.44280757410115 dv: -4.55e-13 g: 2.7910936e-7 cos: +0.98 +0.72 +0.40 00000011111111111122 192
[ 40.267s][TRACE] i: 46 v: -3669.44280757410161 dv: -4.55e-13 g: 5.9695147e-8 cos: +0.79 +0.73 +0.70 00000000000111111122 288
[ 40.707s][TRACE] i: 47 v: -3669.44280757410161 dv: +0.00e0 g: 1.6299424e-8 cos: +0.33 +0.26 +0.36 00000000001111111122 288
[ 41.017s][TRACE] i: 48 v: -3669.44280757410070 dv: +9.09e-13 g: 6.3374115e-9 cos: +0.28 +0.09 +0.32 00000011111111111122 288
[ 41.332s][TRACE] i: 49 v: -3669.44280757410161 dv: -9.09e-13 g: 1.1303685e-8 cos: +0.38 +0.10 +0.88 00000011111111111122 288
[ 41.805s][TRACE] i: 50 v: -3669.44280757410161 dv: +0.00e0 g: 6.5506183e-9 cos: +0.89 +0.33 +0.46 00000001111111111122 192
[ 42.123s][TRACE] i: 51 v: -3669.44280757410115 dv: +4.55e-13 g: 1.7447008e-9 cos: +0.78 +0.69 +0.57 00000000011111111122 288
[ 42.609s][TRACE] i: 52 v: -3669.44280757410161 dv: -4.55e-13 g: 9.6728816e-10 cos: +0.39 +0.31 +0.51 00000011111111111122 245
[ 42.929s][TRACE] i: 53 v: -3669.44280757410252 dv: -9.09e-13 g: 1.4248736e-10 cos: +0.52 +0.20 +0.35 00000011111111111122 192
[ 43.363s][TRACE] i: 54 v: -3669.44280757410161 dv: +9.09e-13 g: 2.6479234e-10 cos: +0.17 +0.09 +0.81 00000011111111111122 203
[ 43.646s][TRACE] i: 55 v: -3669.44280757410206 dv: -4.55e-13 g: 1.9694943e-10 cos: +0.89 +0.15 +0.41 00001111111111111122 170
[ 44.037s][TRACE] i: 56 v: -3669.44280757410115 dv: +9.09e-13 g: 1.1031621e-10 cos: +0.84 +0.75 +0.63 00000000111111111122 209
[ 44.349s][TRACE] i: 57 v: -3669.44280757410070 dv: +4.55e-13 g: 7.8663621e-11 cos: +0.97 +0.70 +0.63 00000111111111111122 245
[ 44.671s][TRACE] i: 58 v: -3669.44280757410161 dv: -9.09e-13 g: 2.1707441e-11 cos: +0.71 +0.68 +0.66 00000111111111111112 124
[ 44.983s][TRACE] i: 59 v: -3669.44280757410161 dv: +0.00e0 g: 1.6307851e-11 cos: +0.36 +0.25 +0.33 00000000111111111112 24
[ 45.378s][TRACE] i: 60 v: -3669.44280757410161 dv: +0.00e0 g: 1.4125871e-11 cos: +0.62 +0.18 +0.32 00000001111111111112 41
[ 45.694s][TRACE] i: 61 v: -3669.44280757410161 dv: +0.00e0 g: 1.1995141e-11 cos: +0.73 +0.39 +0.33 00000001111111111112 122
[ 46.128s][TRACE] i: 62 v: -3669.44280757410161 dv: +0.00e0 g: 1.3708902e-11 cos: +0.72 +0.44 +0.42 00000000011111111112 26
[ 46.129s][INFO] ACGSD Finished.
[ 46.129s][INFO] Iterations: 62
[ 46.129s][INFO] Value: -3669.4428075741016
[ 46.129s][INFO] Grad Norm: 1.370890187584052e-11
[ 46.129s][INFO] Grad Max: 1.9766528305187246e-12
[ 46.130s][TRACE] ============================
[ 46.130s][TRACE] Writing 'out/ev-loop-01.1.structure'
[ 46.261s][TRACE] Computing symmetry
[ 46.640s][TRACE] Computing deperms in primitive cell
thread 'main' panicked at 'compute_stars: input deperms violate the group axioms!', src/tasks/math/stars.rs:66:13
note: Run with `RUST_BACKTRACE=1` for a backtrace.
[ 47.474s][INFO] successfully leaked tempdir at /tmp/rsp2-.a6h5k4pELaPO
Well that's funny.
When I said I was running without threads, I forgot to reset OMP_NUM_THREADS
. If I do reset OMP_NUM_THREADS
, then the following behavior is observed:
Half of the time it dies with FloatIsNan
, and the other half of the time, the initial energy is exactly -6919.09741961267173
(a reasonable value).
Anyways, something is clearly wrong with how rsp2 communicates with lammps. (surprising nobody)
I tried getting rid of create_atoms random
+ set atom
in favor of create_atoms single
, but to no avail.
Colin, you're not gonna like this.
Guess what happens when I use pair_style rebo
instead of pair_style rebo/omp
?
It works. Consistently.
Workaround provided in https://github.com/ExpHP/rsp2/commit/db0156ac242e99ef6de06817d39ffcd4037b4433 to be able to select rebo
Thanks for the workaround (fixed that one at least), unfortunately these structures seem to be the gifts that just keep giving. A new error for these inputs:
[ 0.228s][INFO] Available resources for parallelism:
[ 0.229s][INFO] MPI: 1 process(es)
[ 0.229s][INFO] OpenMP: 1 thread(s) per process (OMP_NUM_THREADS)
[ 0.229s][INFO] : 4 thread(s) in single-process tasks (RSP2_MAX_THREADS)
[ 0.229s][INFO] rayon: 4 thread(s) on the root process
[ 0.230s][WARN] 'lammps-update-style: fast' is experimental (this message will not be shown again)
[ 0.231s][TRACE] bond graph: intermediate supercell: [1, 1, 1], r = 1.70017
[ 0.232s][TRACE] bond graph: true supercell: centered_diagonal([1, 1, 1])
[ 0.308s][TRACE] Writing 'gyroid15/initial.structure'
[ 0.373s][TRACE] ============================
[ 0.373s][TRACE] Begin relaxation # 1
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1440`,
right: `1416`', src/io/lammps/lib.rs:1073:9
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::print
at libstd/sys_common/backtrace.rs:71
at libstd/sys_common/backtrace.rs:59
2: std::panicking::default_hook::{{closure}}
at libstd/panicking.rs:211
3: std::panicking::default_hook
at libstd/panicking.rs:227
4: std::panicking::rust_panic_with_hook
at libstd/panicking.rs:475
5: std::panicking::continue_panic_fmt
at libstd/panicking.rs:390
6: std::panicking::begin_panic_fmt
at libstd/panicking.rs:345
7: <rsp2_lammps_wrap::Lammps<P>>::update_computation
8: <<rsp2_tasks::potential::lammps::Builder<P>>::lammps_diff_fn::MyDiffFn<Mm> as rsp2_tasks::potential::DiffFn<Mm>>::compute
9: rsp2_tasks::potential::PotentialBuilder::initialize_flat_diff_fn::{{closure}}
10: rsp2_minimize::acgsd::_acgsd::{{closure}}
11: rsp2_minimize::acgsd::_acgsd
12: rsp2_minimize::acgsd::acgsd
13: rsp2_tasks::cmd::relaxation::do_relax
14: rsp2_tasks::cmd::<impl rsp2_tasks::cmd::trial::TrialDir>::run_relax_with_eigenvectors
15: <rsp2_lammps_wrap::low_level::mpi_helper::MpiOnDemand<D>>::install
16: rsp2_tasks::entry_points::wrap_main_with_lammps_on_demand
17: rsp2_tasks::entry_points::wrap_main
18: rsp2_tasks::entry_points::_rsp2_acgsd
19: rsp2_tasks::entry_points::rsp2
20: std::rt::lang_start::{{closure}}
21: std::panicking::try::do_call
at libstd/rt.rs:59
at libstd/panicking.rs:310
22: __rust_maybe_catch_panic
at libpanic_unwind/lib.rs:105
23: std::rt::lang_start_internal
at libstd/panicking.rs:289
at libstd/panic.rs:392
at libstd/rt.rs:58
24: main
25: __libc_start_main
26: _start
[ 0.460s][INFO] successfully leaked tempdir at /tmp/rsp2-.cyRoFAyIZGyZ
Bonus segfault if I try to use mpirun with omp off on this input:
Edit: command line OMP_NUM_THREADS=1 mpirun -np 4 rsp2 -c input.yaml -o g10-segfault --force g10-relaxed.vasp
[ 0.216s][INFO] Available resources for parallelism:
[ 0.216s][INFO] MPI: 4 process(es)
[ 0.216s][INFO] OpenMP: 1 thread(s) per process (OMP_NUM_THREADS)
[ 0.216s][INFO] : 4 thread(s) in single-process tasks (RSP2_MAX_THREADS)
[ 0.216s][INFO] rayon: 4 thread(s) on the root process
[ 0.218s][WARN] 'lammps-update-style: fast' is experimental (this message will not be shown again)
[ 0.219s][TRACE] bond graph: intermediate supercell: [1, 1, 1], r = 1.70017
[ 0.219s][TRACE] bond graph: true supercell: centered_diagonal([1, 1, 1])
[ 0.254s][TRACE] Writing 'g10-segfault/initial.structure'
[ 0.298s][TRACE] ============================
[ 0.298s][TRACE] Begin relaxation # 1
[ 0.333s][TRACE] i: 0 v: -6923.85556855887171 dv: +0.00e0 g: 2.5554223e-4 00000000000000000112 3
[ 0.333s][DEBUG] Using steepest descent. (i: 1)
[ 0.338s][TRACE] i: 1 v: -6923.85556855640607 dv: +2.47e-9 g: 1.6012033e-4 00000000000000111112 3
[ 0.343s][TRACE] i: 2 v: -6923.85556856183484 dv: -5.43e-9 g: 9.9874784e-5 cos: +0.53 00000000001111111112 9
[ 0.349s][TRACE] i: 3 v: -6923.85556855880714 dv: +3.03e-9 g: 1.0545607e-4 cos: +0.59 +0.31 00000000001111111112 4
[ 0.354s][TRACE] i: 4 v: -6923.85556856332551 dv: -4.52e-9 g: 1.0082739e-4 cos: +0.80 +0.47 +0.25 00000000011111111112 10
[ 0.357s][TRACE] i: 5 v: -6923.85556856646417 dv: -3.14e-9 g: 1.5434653e-4 cos: +0.94 +0.72 +0.43 00000000000000011112 6
[ 0.363s][TRACE] i: 6 v: -6923.85556856235507 dv: +4.11e-9 g: 9.3231352e-5 cos: +0.96 +0.90 +0.70 00000011111111111112 11
[ 0.367s][TRACE] i: 7 v: -6923.85556856504900 dv: -2.69e-9 g: 1.4937256e-4 cos: +0.99 +0.93 +0.84 00000000000000111112 7
[ 0.377s][TRACE] i: 8 v: -6923.85556856378753 dv: +1.26e-9 g: 1.2081709e-4 cos: +0.96 +0.93 +0.90 00000000011111111112 12
[ 0.383s][TRACE] i: 9 v: -6923.85556856064704 dv: +3.14e-9 g: 1.2126181e-4 cos: +0.95 +0.92 +0.87 00000000000011111112 7
[ 0.388s][TRACE] i: 10 v: -6923.85556856454059 dv: -3.89e-9 g: 1.2523105e-4 cos: +0.95 +0.91 +0.83 00000000000011111112 7
[ 0.393s][TRACE] i: 11 v: -6923.85556855894174 dv: +5.60e-9 g: 1.1327453e-4 cos: +0.96 +0.92 +0.83 00000000001111111112 11
[ 0.397s][TRACE] i: 12 v: -6923.85556856279800 dv: -3.86e-9 g: 1.5823793e-4 cos: +0.96 +0.92 +0.84 00000000000000111112 7
[ 0.402s][TRACE] i: 13 v: -6923.85556855827599 dv: +4.52e-9 g: 1.1697806e-4 cos: +0.98 +0.94 +0.86 00000000111111111112 10
[ 0.407s][TRACE] i: 14 v: -6923.85556855843970 dv: -1.64e-10 g: 1.9073105e-4 cos: +0.99 +0.96 +0.90 00000000000000111112 6
[ 0.412s][TRACE] i: 15 v: -6923.85556856436051 dv: -5.92e-9 g: 1.4542335e-4 cos: +0.98 +0.95 +0.94 00000000001111111112 7
[ 0.417s][TRACE] i: 16 v: -6923.85556856625590 dv: -1.90e-9 g: 1.3379912e-4 cos: +0.97 +0.95 +0.91 00000000000111111112 6
[ 0.426s][TRACE] i: 17 v: -6923.85556855889081 dv: +7.37e-9 g: 2.0577697e-4 cos: +0.96 +0.93 +0.90 00000000000000001112 10
[ 0.430s][TRACE] i: 18 v: -6923.85556856400763 dv: -5.12e-9 g: 1.5059070e-4 cos: +0.99 +0.95 +0.90 00000000011111111112 11
[ 0.436s][TRACE] i: 19 v: -6923.85556856325184 dv: +7.56e-10 g: 1.8905567e-4 cos: +0.97 +0.96 +0.91 00000000011111111112 8
[ 0.441s][TRACE] i: 20 v: -6923.85556856813037 dv: -4.88e-9 g: 2.4710529e-4 cos: +0.98 +0.96 +0.93 00000000000000011112 8
[ 0.446s][TRACE] i: 21 v: -6923.85556856199128 dv: +6.14e-9 g: 1.6636304e-4 cos: +0.99 +0.97 +0.94 00000000001111111112 5
[ 0.450s][TRACE] i: 22 v: -6923.85556856385665 dv: -1.87e-9 g: 2.0753738e-4 cos: +0.98 +0.97 +0.95 00000000000011111112 6
[ 0.455s][TRACE] i: 23 v: -6923.85556856283074 dv: +1.03e-9 g: 2.0287575e-4 cos: +0.99 +0.97 +0.95 00000000000011111112 9
[ 0.461s][TRACE] i: 24 v: -6923.85556856100993 dv: +1.82e-9 g: 1.4585492e-4 cos: +0.99 +0.98 +0.95 00000000111111111112 4
[ 0.461s][INFO] ACGSD Finished.
[ 0.461s][INFO] Iterations: 24
[ 0.461s][INFO] Value: -6923.85556856101
[ 0.461s][INFO] Grad Norm: 1.458549176948771e-4
[ 0.461s][INFO] Grad Max: 2.493511354585698e-5
[ 0.461s][TRACE] ============================
[ 0.461s][TRACE] Writing 'g10-segfault/ev-loop-01.1.structure'
[ 0.528s][TRACE] Computing symmetry
[ 1.182s][TRACE] Computing deperms in primitive cell
[ 1.182s][DEBUG] Surveying displacement implementations:
[ 1.183s][DEBUG] axial: Produces 5760
[ 1.185s][DEBUG] diag: Produces 5760
[ 1.191s][DEBUG] diag-2: Produces 5760
[ 1.197s][TRACE] num spacegroup ops: 1
[ 1.197s][TRACE] num displacements: 5760
[ 1.197s][TRACE] Computing forces at displacements
disp 5760 of 5760
[ 7.547s][TRACE] Done computing forces at displacements
[ 7.547s][TRACE] Computing deperms in supercell
[ 7.548s][TRACE] Computing sparse force constants
[ 7.577s][TRACE] Computing sparse dynamical matrix
[ 9.680s][TRACE] nnz: 18720 out of 921600 blocks (matrix density: 2.031e-2)
[ 9.680s][TRACE] Diagonalizing dynamical matrix
[ 9.680s][TRACE] Computing most negative eigensolutions.
[ 10.513s][WARN] trace: precomputing OPinv for shift-invert
[ 10.832s][WARN] /usr/lib/python3.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py:295: SparseEfficiencyWarning: splu requires CSC matrix format
[ 10.832s][WARN] warn('splu requires CSC matrix format', SparseEfficiencyWarning)
[ 10.832s][WARN] trace: shift-invert call 1
[ 13.529s][WARN] trace: shift-invert call 2
[ 16.432s][WARN] trace: shift-invert call 3
[ 19.144s][WARN] trace: shift-invert call 4
[ 22.478s][WARN] Good -- Bad (Old Wrong OrthoFail OrthoBad)
[ 22.478s][WARN] 3 -- 1 ( 0 1 0 0 )
[ 22.478s][WARN] 0 -- 4 ( 3 1 0 0 )
[ 22.478s][WARN] 0 -- 4 ( 3 1 0 0 )
[ 22.478s][WARN] 0 -- 3 ( 3 0 0 0 )
[ 22.479s][WARN] trace: trying non-shift-invert
[ 25.727s][TRACE] Done diagonalizing dynamical matrix
[ 25.729s][TRACE] ============================
[ 25.729s][TRACE] Finished diagonalization
[ 27.721s][TRACE] Computing eigensystem info
[ 27.721s][TRACE] computing EvAcousticness
[ 27.721s][TRACE] computing EvPolarization
[ 27.722s][TRACE] not computing EvLayerAcousticness due to missing requirement SiteLayers
[ 27.722s][TRACE] not computing UnfoldProbs due to missing requirement SiteLayers
[ 27.722s][TRACE] computing EvRamanTensors
[ 27.723s][INFO] # (C) Frequency(cm-1) Acoust. RamnA RamnB [ X , Y , Z ]
[ 27.723s][INFO] (T) -0.2451772453052291 1-1e-06 0 0 [0.00, 0.00, 1.00]
[ 27.723s][INFO] (T) -0.0010127036152851687 1-1e-11 0 0 [0.83, 0.17, 0.00]
[ 27.724s][INFO] (T) -0.0008918143931896466 1-1e-11 0 0 [0.17, 0.83, 0.00]
[ 27.724s][INFO] (-) 45.95981583710969 1e-08 1e0 1e0 [0.24, 0.35, 0.41]
[ 27.724s][INFO] (-) 45.985920719463614 1e-08 3e-1 3e-1 [0.41, 0.41, 0.18]
[ 27.724s][INFO] (-) 46.010025797323486 1e-09 4e-1 1e-1 [0.35, 0.24, 0.41]
[ 27.724s][INFO] (-) 58.08965647010484 1e-10 6e-2 6e-2 [0.24, 0.24, 0.52]
[ 27.724s][INFO] (-) 58.17720758382288 1e-09 3e-1 4e-1 [0.36, 0.49, 0.15]
[ 27.724s][INFO] (-) 58.368162080215804 1e-08 1e-1 2e-1 [0.50, 0.34, 0.16]
[ 27.724s][INFO] (-) 58.47182753235027 1e-08 2e-1 2e-1 [0.23, 0.28, 0.50]
[ 27.724s][INFO] (-) 60.16169460590661 1e-10 2e-1 2e-1 [0.34, 0.33, 0.33]
[ 27.724s][INFO] (-) 60.36572993565027 1e-09 8e-1 1e0 [0.33, 0.33, 0.34]
[ 27.724s][TRACE] Writing 'g10-segfault/ev-loop-01.2.structure'
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node engine exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
#0 LAMMPS_NS::NPairFullBinGhostOmp::build(LAMMPS_NS::NeighList*) [clone ._omp_fn.0] () at ../npair_full_bin_ghost_omp.cpp:105
#1 0x00007fa221487e83 in GOMP_parallel (fn=0x7fa2221488a0 <_ZN9LAMMPS_NS20NPairFullBinGhostOmp5buildEPNS_9NeighListE._omp_fn.0(void)>, data=0x7ffd11a0a570, num_threads=1, flags=0)
at /build/gcc/src/gcc/libgomp/parallel.c:168
#2 0x00007fa222148863 in LAMMPS_NS::NPairFullBinGhostOmp::build (this=<optimized out>, list=0x7fa2204a36c0) at ../npair_full_bin_ghost_omp.cpp:47
#3 0x00007fa222331deb in LAMMPS_NS::Neighbor::build (this=0x7fa2204c6200, topoflag=1) at ../neighbor.cpp:2147
#4 0x00007fa222394eab in LAMMPS_NS::Verlet::run (this=0x7fa220578c40, n=1) at ../verlet.cpp:285
#5 0x00007fa2221dd051 in LAMMPS_NS::Run::command (this=this@entry=0x7ffd11a0a7e0, narg=narg@entry=5, arg=arg@entry=0x7fa220440c40) at ../run.cpp:183
#6 0x00007fa221d2603e in LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run> (lmp=<optimized out>, narg=5, arg=0x7fa220440c40) at ../input.cpp:861
#7 0x00007fa221d246ca in LAMMPS_NS::Input::execute_command (this=this@entry=0x7fa220575680) at ../input.cpp:844
#8 0x00007fa221d250b5 in LAMMPS_NS::Input::one (this=0x7fa220575680, single=0x7fa22053d950 "run 1 pre no post no") at ../input.cpp:312
#9 0x00007fa221f3c3b3 in lammps_command (ptr=<optimized out>, str=<optimized out>) at ../library.cpp:223
#10 0x0000556909022b98 in <rsp2_lammps_wrap::low_level::plain::LammpsOwner as rsp2_lammps_wrap::low_level::LowLevelApi>::command ()
#11 0x0000556908d818c7 in <rsp2_lammps_wrap::low_level::mpi::LammpsDispatch as rsp2_lammps_wrap::low_level::mpi_helper::DispatchMultiProcess>::dispatch ()
#12 0x0000556908dd1a0e in <rsp2_lammps_wrap::low_level::mpi_helper::MpiOnDemandInner<D>>::non_root_event_loop ()
#13 0x0000556908dcfff6 in <rsp2_lammps_wrap::low_level::mpi_helper::MpiOnDemand<D>>::install ()
#14 0x0000556908d8f4f1 in rsp2_tasks::entry_points::wrap_main_with_lammps_on_demand ()
#15 0x0000556908d8f456 in rsp2_tasks::entry_points::wrap_main ()
#16 0x0000556908d905f0 in rsp2_tasks::entry_points::_rsp2_acgsd ()
#17 0x0000556908d905d8 in rsp2_tasks::entry_points::rsp2 ()
#18 0x0000556908d7ec03 in std::rt::lang_start::{{closure}} ()
#19 0x000055690917bd93 in std::rt::lang_start_internal::{{closure}} () at libstd/rt.rs:59
#20 std::panicking::try::do_call () at libstd/panicking.rs:310
#21 0x00005569091a114a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:105
#22 0x000055690917e5a6 in std::panicking::try () at libstd/panicking.rs:289
#23 std::panic::catch_unwind () at libstd/panic.rs:392
#24 std::rt::lang_start_internal () at libstd/rt.rs:58
#25 0x0000556908d7ec74 in main ()
Only sometimes (?) fails, input is an all-carbon structure with high symmetry and
960
atoms. Output files are here g10.tar.gz, commit ref is 6b6a7b884fec4171b2bbb824af0b627a1f2fca74. Standard output/err is as follows: