adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
172 stars 122 forks source link

Issue with LINE SEARCH FAILED. #220

Open trinitite271 opened 2 weeks ago

trinitite271 commented 2 weeks ago

After fixing the tomo bug (https://github.com/adjtomo/seisflows/issues/219), I tried to invert the model. However I encountered 'LINE SEARCH FAILED'. I guessed that one shot was not enough, so I inverted with 30 shots, but I still got 'LINE SEARCH FAILED', but the gradient looks good, and the step length increase and misfit is decreasing, looks good. The full result in https://github.com/trinitite271/seisflow1/tree/master

2024-06-22 15:20:54 [INFO] | FINISH QUANTIFY MISFIT: 030 2024-06-22 15:20:54 [INFO] | misfit f_try (i01s04) = 1.544E-04 2024-06-22 15:20:54 [INFO] | saving misfit and step length for step count == 4 2024-06-22 15:20:54 [INFO] | step count = 0, 1, 2, 3, 4 2024-06-22 15:20:54 [INFO] | step length = 0.000E+00, 3.209E-01, 5.192E-01, 8.401E-01, 1.359E+00 2024-06-22 15:20:54 [INFO] | misfit val = 1.718E-04, 1.673E-04, 1.647E-04, 1.606E-04, 1.544E-04 2024-06-22 15:20:54 [INFO] | increment step count -> 5 2024-06-22 15:20:54 [INFO] | fail: bracketing line search has failed to reduce the misfit before exceeding step_count_max=5 2024-06-22 15:20:54 [DEBU] | checking gradient/search direction angle, theta: 0.000 2024-06-22 15:20:54 [INFO] | search direction below threshold 0.001, will not attempt restart 2024-06-22 15:20:54 [CRIT] |

LINE SEARCH FAILED ////////////////// Line search has failed to reduce the misfit and has run out of fallback options. Aborting inversion.

gradient: fig1

trinitite271 commented 1 week ago

I mean it's a simple elastic wave inversion. Updating vp and vs at the same time, I'm not sure if it's affected by cross-talk noise or if there's something wrong with my parameter file. I also tried increasing the step count but get 'minimum poisson's ratio is negative'. I have tested (with a matlab FD2D code) that simultaneous vp vs inversion is usually affected by cross-talk noise even in model test.

I think it may be that the model update is not good enough and LINE SEARCH want to keep working, and reach the step count limit (step_count_max=5). My question is why the model is not updated when the data error drops (after reaching step_count_max)

2024-06-23 17:51:22 [INFO] | FINISH QUANTIFY MISFIT: 030 2024-06-23 17:51:22 [INFO] | misfit f_try (i01s07) = 1.152E-04 2024-06-23 17:51:22 [INFO] | saving misfit and step length for step count == 7 2024-06-23 17:51:22 [INFO] | step count = 0, 1, 2, 3, 4, 5, 6, 7 2024-06-23 17:51:22 [INFO] | step length = 0.000E+00, 3.209E-01, 5.192E-01, 8.401E-01, 1.359E+00, 2.199E+00, 3.559E+00, 5.758E+00 2024-06-23 17:51:22 [INFO] | misfit val = 1.718E-04, 1.673E-04, 1.647E-04, 1.606E-04, 1.544E-04, 1.454E-04, 1.330E-04, 1.152E-04 2024-06-23 17:51:22 [INFO] | increment step count -> 8 2024-06-23 17:51:22 [INFO] | first iteration, defaulting to bracketing line search 2024-06-23 17:51:22 [INFO] | try: misfit not bracketed, increasing step length using golden ratio 2024-06-23 17:51:22 [DEBU] | checking safeguard min allowable step length: 0.01% 2024-06-23 17:51:22 [DEBU] | checking safeguard max allowable step length: 100% 2024-06-23 17:51:22 [INFO] | step length alpha = 9.316E+00 2024-06-23 17:51:22 [INFO] | updating model with dm (dm_min=-5.47E+02, dm_max = 4.29E+02) 2024-06-23 17:51:22 [INFO] | trial step unsuccessful. re-attempting line search 2024-06-23 17:51:24 [INFO] | LINE SEARCH STEP COUNT 08

2024-06-23 17:51:24 [INFO] | m_try model parameters for line search evaluation: 2024-06-23 17:51:24 [WARN] | no coordinates found for assumed SPECFEM2D model, will not be able to plot figures 2024-06-23 17:51:24 [WARN] | minimum poisson's ratio is negative 2024-06-23 17:51:24 [INFO] | vp: min=416.160; mean=1417.071; max=1885.683 2024-06-23 17:51:24 [INFO] | vs: min=477.623; mean=716.117; max=942.920 2024-06-23 17:51:24 [INFO] | evaluating objective function for source 001 2024-06-23 17:51:24 [DEBU] | running forward simulation with 'Specfem2D' 2024-06-23 17:51:24 [DEBU] | running executable with cmd: 'bin/xmeshfem2D' 2024-06-23 17:51:25 [DEBU] | running executable with cmd: 'bin/xspecfem2D' 2024-06-23 17:51:33 [CRIT] |

                        EXTERNAL SOLVER ERROR
                        /////////////////////

The external numerical solver has returned a nonzero exit code (failure). Consider stopping any currently running jobs to avoid wasted computational resources. Check 'scratch/solver/mainsolver/fwd_solver.log' for the solvers stdout log message. The failing command and error message are:

exc: bin/xspecfem2D err: Command 'bin/xspecfem2D' returned non-zero exit status 134.

trinitite271 commented 1 week ago

update: I set the initial vs model = the true vs model, which should avoid cross-talk noise. However the same problem still occurs. vp's gradient looks great, but:

when step_count_max=5:

'fail: bracketing line search has failed to reduce the misfit before exceeding',

when step_count_max=9:

2024-06-24 16:20:32 [INFO] | misfit f_try (i01s08) = 2.579E-05 2024-06-24 16:20:32 [INFO] | saving misfit and step length for step count == 8 2024-06-24 16:20:32 [INFO] | step count = 0, 1, 2, 3, 4, 5, 6, 7, 8 2024-06-24 16:20:32 [INFO] | step length = 0.000E+00, 1.579E+00, 2.556E+00, 4.135E+00, 6.690E+00, 1.083E+01, 1.579E+01, 1.579E+01, 1.579E+01 2024-06-24 16:20:32 [INFO] | misfit val = 3.157E-05, 3.071E-05, 3.022E-05, 2.947E-05, 2.838E-05, 2.696E-05, 2.579E-05, 2.579E-05, 2.579E-05 2024-06-24 16:20:32 [INFO] | increment step count -> 9 2024-06-24 16:20:32 [CRIT] | polynomial line fitting returned a negative p[0] value which signifies a negative misfit and is not allowed.

vp gradient: image

trinitite271 commented 3 days ago

updates and Suggestions: After a full debug process via pdb I find the problem. In the first iteration, bracket try to keep increasing the step length until the misfit function increase. but, in the parameters.yaml, we set a defult step_len_max: 0.1, This leads to the fact, that when the step_count reaches the limit (step_count_max) , if the misfit function still keeps decreasing, it will reach step_count_max and get error.

Recursive in here:bracket.py

` elif self.step_count < self.step_count_max and all(f <= f[0]):

        alpha = 1.618034 * x[-1]  # 1.618034 is the 'golden ratio'  
        logger.info(f"try: misfit not bracketed, increasing step length "  
                    f"using golden ratio")  
        status = "TRY"

`

However, we can't increase the step length indefinitely, because it will get wrong result and incorrect Vp/Vs ratio.

So I think we should set a reasonable step limit, but when we reach the maximum step length, seisflow should update the gradient with that maximum step length, rather than get error.

bch0w commented 1 day ago

Hi @trinitite271, thanks for the detailed logging here. Yes this has now become more of a science problem than a code problem.

The Line Search module is attempting to work a bracketing line search to get a handle on the misfit space, which means it needs to be able to reduce the misfit, and then increase it again, so that it can estimate the curvature of the misfit space. The fact that the line search is able to reduce the misfit but not re-increase it means that it cannot do this, suggesting the problem may be ill-posed (starting solution too far from the true solution, not enough constraint from the 'data', misfit space is not well-behaved, etc.).

As you have discovered, you have a few knobs to turn to help this along step_len_init determines how far (as a percentage of your model) your first step goes in updating the model, and step_count_max dictates how many tries your line search will attempt before declaring the line search a failure. step_len_min and step_len_max also dictate how small or large, respectively, your steps can be so that you do not update with a negligible amount, or by too drastic an amount. You may also need more regularization/postprocessing to make your gradient more well-behaved.

I would advise against updating a model just because you hit the maximum step length, because the line search is really telling you that it does not have a good handle on the misfit space so you may be updating into a local minimum and/or your final model may be incorrect/unphysical, which is what is what it is telling you when you see a Vp/Vs ratio, poisson's ratio error, etc.

Unfortunately this is just a part of solving these types of non-unique, iterative inverse problems! Even for elastic 2D problems.