adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
183 stars 124 forks source link

SIGTRAP & SIGFILL Errors #209

Closed raulleoncz closed 7 months ago

raulleoncz commented 7 months ago

Hello @bch0w,

I have a question regarding to some problems. When the optimization is running (using L-BFGS), I've been dealing with the following error:

================================================================================ EXTERNAL SOLVER ERROR
/////////////////////
The external numerical solver has returned a nonzero exit code (failure). Consider stopping any currently running jobs to avoid wasted computational resources. Check 'scratch/solver/mainsolver/fwd_solver.log' for the solvers stdout log message. The failing command and error message are:

exc: mpirun -n 2 bin/xspecfem2D err: Command 'mpirun -n 2 bin/xspecfem2D' returned non-zero exit status 133.

The first time I thought the simulation got unstable due to any selection of the model parameters but after doing some tests I noticed that specfem has no problem as long as the CFL condition is being satisfy. I tried increasing the Nb_points per wavelength and selecting a smaller time stepping following the recommendation in the fwd_solver.log, but the problems persists. Looking into the fwd_solver.log this time the error was this:

Program received signal SIGTRAP: Trace/breakpoint trap. Backtrace for this error: Could not print backtrace: executable file is not an executable Could not print backtrace: executable file is not an executable

0 0x103742f33

1 0x1037421db

2 0x189f07583

3 0x189f052bb

4 0x189ed58ff

5 0x10304af0b

6 0x103154b0b

7 0x103158733

8 0x1030d37cb

9 0x1030c18ff

10 0x1030ac3ef

11 0x103087f2b

12 0x103065bef

13 0x102f5dc5b

14 0x10386e107

15 0x103608f3b

16 0x10343458b

17 0x102dd52eb

18 0x102de86ef


Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpirun noticed that process rank 1 with PID 0 on node Rauls-MacBook-Pro exited on signal 5 (Trace/BPT trap: 5).

but there are other times where the error is this: Color image maximum amplitude = 1.0203639704453120E+018 Color image created

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:

0 0x1035d6f33

1 0x1035d61db

2 0x19d74f583

3 0x102e40a1f

Is there any way to avoid this problems? The parameters that I'm using are the following: workflow: inversion system: workstation solver: specfem2d preprocess: default optimize: LBFGS


ntask: 1 nproc: 2 tasktime: 10000 mpiexec: mpirun log_level: DEBUG verbose: False


materials: acoustic smooth_h: 500.0 smooth_v: 500.0 components: XZ source_prefix: SOURCE


unit_output: PRE misfit: traveltime adjoint: traveltime normalize: TNORML2 filter: BANDPASS min_freq: 1 max_freq: 10

I am sorry for the inconvenience but I hope you can help me. Thank you.

bch0w commented 7 months ago

Hi @raulleoncz, this looks like a SPECFEM2D issue, but we can try to see if it's SeisFlows derived.

What part of the workflow does this issue happen? If it's during the initial forward simulations, then that might suggest there is something wrong with how you have set up SPECFEM. If it is happening during the line search, then perhaps your model has become unphysical and caused the simulation to fail.

Once we have that information it might be easier to troubleshoot but again it seems like something is going wrong with SPECFEM, leading SeisFlows to crash.

raulleoncz commented 7 months ago

I think I found my error. The CFL and time stepping were right but the frequency of the source was being a little high. I chose a smaller frequency and I think the problem is solved.

Thank you.

bch0w commented 7 months ago

That's great to hear! Happy you were able to solve the problem. I'll close this as complete but if that is incorrect please feel free to re-open!