LBFGS - Githubissues

AlbertZhangHIT commented 4 years ago

Hi Daniel, I feel puzzled about the normlization and denormalization of gradients in LBFGS. Why the gradients multiply C_vp and C_rho in both normalization and denormalization? It will make the backtracking line search fail.

/* Normalization of the gradient   */
/* ------------------------------- */
for (i=1;i<=NX;i=i+IDX){
   for (j=1;j<=NY;j=j+IDY){
      waveconv[j][i] = C_vp * waveconv[j][i];
   }
}

for (i=1;i<=NX;i=i+IDX){
   for (j=1;j<=NY;j=j+IDY){
      gradp[j][i] = waveconv[j][i];
   }
}

/* ===================================================================================================================================================== */
/* ===================================================== GRADIENT rho ================================================================================== */
/* ===================================================================================================================================================== */

/* Normalization of the gradient   */
/* ------------------------------- */
for (i=1;i<=NX;i=i+IDX){
   for (j=1;j<=NY;j=j+IDY){
      waveconv_rho[j][i] = C_rho * waveconv_rho[j][i];
   }
}

     /* Denormalize Gradients */
     for (i=1;i<=NX;i=i+IDX){
        for (j=1;j<=NY;j=j+IDY){

           waveconv[j][i] = waveconv[j][i] * C_vp;
       waveconv_rho[j][i] = waveconv_rho[j][i] * C_rho;

        }
     }

daniel-koehn commented 4 years ago

Hi Albert,

Thanks for the bug report. You are right, this normalization/denormalization makes no sense. The l-bfgs approach originally used the Pseudo-Hessian as Hessian approximation, where I normalized the Hessian approximation and denormalized the search direction from l-bfgs similar to Brossier (2011): https://www.sciencedirect.com/science/article/pii/S0098300410003237 During the modularization of the DENISE code, I removed the Pseudo-Hessian approximation from the l-bfgs algorithm. However, it might be a good idea to re-implement the Pseudo-Hessian approximation, because it improves the convergence of the l-bfgs algorithm, as I have shown in this tutorial for georadar FWI: https://danielkoehnsite.files.wordpress.com/2018/10/koehn_etal_2017_germaine_te_fwi.pdf

Best regards,

Daniel

AlbertZhangHIT commented 4 years ago

But if we use EPRECOND the gradient should have been preconditioned in the computation proccedure grad_obj_**. Do we have to re-do the precondition again in the two-loop recursion l-BFGS?

Best regards, Albert

daniel-koehn commented 4 years ago

Hi Albert,

Technically, you have to use the gradients, not preconditioned gradients in the l-bfgs recursion loops (see p. 95+96 in my georadar fwi presentation). You can then either compute an initial Hessian approximation based on model and gradient differences (p. 95) or use the Pseudo-Hessian (p. 96). However, the l-bfgs optimization also seem to converge when using preconditioned gradients, instead of pure gradients.

Best regards,

Daniel

AlbertZhangHIT commented 4 years ago

I got it. Thank you Daniel.

Best,

Albert

AlbertZhangHIT commented 4 years ago

Hi Daniel, I'm sorry to bother you and reopen this issue.

I encountered a issue about LBFGS in AC case, which showed that it leads to unstable simulation after tens of FWI iterations when I use only p conponent and the GRAD_FORM=2 (I have set the upper limit for model parameters to be the maximum of the true model). In GRAD_FORM=2 and only p conponent case, the gradient is calculated by

for (i=1;i<=NX;i=i+IDXI){ 
    for (j=1;j<=NY;j=j+IDYI){
        if(GRAD_FORM==1){
            // gradients with data integration 
            (*fwiPSV).forward_prop_x[imat] = (*waveAC).p[j][i];  
        }
        if(GRAD_FORM==2){
            // gradients without data integration
            (*fwiPSV).forward_prop_x[imat] = (*waveAC).ux[j][i];  
        }
        if(QUELLTYPB>=4){
            (*fwiPSV).forward_prop_x[imat] *= -1.0;
        }            
        imat++;
    }
}

The gradient is multiplied by -1 in p conponent case because the p-conponent residual and y-conponent residual has different sign shown as follows (p-conponent residual at top, and y-conponent residual at bottom). p adj shot6 y adj shot6

daniel-koehn commented 4 years ago

Hi Albert,

Your bug reports are always welcome. I have some questions:

Are you using the GRAD_FORM=2 AC gradients from the DENISE Github-Repository? I think I have not implemented them yet, therefore activating GRAD_FORM=2 will generally produce no reasonable results.
If you have implemented correct GRAD_FORM=2 gradients, does the PCG method (GRAD_METHOD=1, PCG_BETA=2) produce more reasonable results? I have experienced some problems with the l-bfgs algorithm when the problem has an even number of parameter classes as in the acoustic case (vp, density), while for an odd number of parameter classes like the PSV case (vp, vs, density) l-bfgs seem to converge correctly. There might be a bug in the l-bfgs implementation when using even number parameter class problems.

The problem with the sign change between p- and y-component data is interesting. I will take a closer look into it.

Best regards,

Daniel

AlbertZhangHIT commented 4 years ago

Hi Daniel,

I implemented the GRAD_FORM=2 AC gradients in ac.c as the snippet shows in the last comment, which I followed the PSV gradients calculation in psv.c. I activated the choice of GRAD_FORM from the input file for DENISE.
I muted the gradient computation and update of density such that density keeps as constant. It has only one parameter class in this acoustic case. As far as I observed, the unstable simulation situation happens especially when the initial velocity is far away from the true velocity(eg. heavily Gaussian smoothed). I also tested GRAD_FORM=1 and the unstable simulation also happened. The unstable simulation happens because 0 occurs in rho_LBFGS vector.

Best regards,

Albert

daniel-koehn commented 4 years ago

Hi Albert,

Thanks for the infos, I just calculated the gradients for the 2D acoustic problem in pressure-velocity formulation according to Hu (2012, p. 85 - 88)

The resulting gradients without data integration should look like this: Vp + density gradients

The code modifications are already uploaded to the Github repository.

The acoustic FWI results for the Marmousi-2 model using PCG, GRAD_FORM=2 and the workflow from the Marmousi-Quickstart tutorial, starting from the 1D initial model ...

Marmousi_grad_form_2

... look reasonable for the Vp-model (right) and quite crappy for the density model (left). One reason might be that the data integration in the GRAD_FORM=1 gradients introduces low frequency content, missing in GRAD_FORM=2. Another issue is the simple inverse Hessian approximation.

Next, I will check how the l-bfgs optimization will converge. If you suppress the density model updates in the workflow file, the density gradients are simply set to zero, but the number of parameter classes does not change. I will also check if removing the density updates from the l-bfgs optimzation will indeed improve its convergence.

Best regards,

Daniel

daniel-koehn commented 4 years ago

Hi Albert,

I have run the same Marmousi-2 test problem as above using l-BFGS instead of PCG optimization for the GRAD_FORM=2 gradients, inverting pressure component data. The resolution of the resulting vp model is a little bit improved, while the density model looks still crappy, even though in a different way

marmousi-2_grad_form_2_lbfgs_vp_vs

Inverting only for the vp model by setting INV_RHO_ITER = 1000 at each inversion stage in the FWI workflow file improves the resolution significantly

marmousi-2_grad_form_2_lbfgs_vp_only

Of course, I used unrealistic ideal conditions for this FWI run, including a low pass filtered spike wavelet with low frequency content and an 1D initial model not introducing cycle-skipping. Nevertheless, the FWI seem to converge, so the l-BFGS implementation in DENISE for the acoustic problem seem to be correct, but might fail depending on the problem.

It is still quite surprising that the density inversion converges much better when using vx-vy-component data and GRAD_FORM=1

marmousi-2_grad_form_1_lbfgs_vx_vy

so I would not rule out an implementation issue for the GRAD_FORM=2 density gradient.

Best regards,

Daniel

AlbertZhangHIT commented 4 years ago

Hi Daniel,

Thank you for your great effort to check those issues. I guess the unstable simulation situation should occur in cycle-skipping case. I realized that the heavily Gaussian smoothed initial model and a single relatively broad frequency band (eg. 3-10Hz) in my experiments may lead to cycle-skipping.

Best regards, Albert

daniel-koehn / DENISE-Black-Edition

LBFGS #23