MITgcm / MITgcm

M.I.T General Circulation Model master code and documentation repository
http://mitgcm.org/
MIT License
324 stars 239 forks source link

quite a few AD experiments have awful gradient checks #518

Open mjlosch opened 2 years ago

mjlosch commented 2 years ago

I am listing the experiments with grdchk RMS of > 1e-4:

(base)  verification > grep "grdchk  summary" */results/output_adm*.txt (and some editing)
OpenAD:                             grdchk  summary  :  RMS of    8 ratios =  3.3808379600181E+01
global_ocean.90x40x15.bottomdrag:   grdchk  summary  :  RMS of    4 ratios =  1.1908421824239E-03
global_ocean.90x40x15.kapredi:      grdchk  summary  :  RMS of    4 ratios =  3.1445586860325E-02
global_ocean.90x40x15:              grdchk  summary  :  RMS of    4 ratios =  5.9224476803208E-02
global_ocean.cs32x15.seaice:        grdchk  summary  :  RMS of    4 ratios =  6.9752447929930E-04
global_ocean.cs32x15.seaice_dynmix: grdchk  summary  :  RMS of    4 ratios =  7.8573080910848E-03
global_with_exf:                    grdchk  summary  :  RMS of    5 ratios =  2.6722023344107E-04
lab_sea:                            grdchk  summary  :  RMS of    5 ratios =  6.7134028427524E-04
offline_exf_seaice.thsice:          grdchk  summary  :  RMS of    4 ratios =  1.0948070499757E-01

At least some of these numbers can be improved very much by adjusting the grdchk_eps, e.g. for OpenAD setting grdchk_eps=1e-6 (currently 1e-2) reduces the RMS to 3.5E-01 (still not great), for global_ocean.90x40x15, setting grdchk_eps=1e-4 (currently 1e-2), reduces the RMS to 2.8E-06, and so on. Some cannot be improved and the AD gradients are probably wrong, but it is difficult to identify those, before we do not fix the ones where the large RMS are due to inappropriate grdchk_eps.

I suggest to adjust these individual grdchk_eps to get a better agreement, where possible. This implies updating the output, as the FD numbers will change.

mjlosch commented 1 year ago

We can add these non-TAF experiments:

OpenAD.tap_adj:            grdchk  summary  :  RMS of    8 ratios =  3.3808379600186E+01
OpenAD.oad:                grdchk  summary  :  RMS of    8 ratios =  3.3808379724557E+01
OpenAD.oad.kpp:            grdchk  summary  :  RMS of    8 ratios =  3.3808313630796E+01
OpenAD.oad.ggl90:          grdchk  summary  :  RMS of    8 ratios =  3.5426055083717E-01
global_ocean.90x40x15.oad: grdchk  summary  :  RMS of    4 ratios =  5.9224477025747E-02
global_with_exf.tap:       grdchk  summary  :  RMS of    5 ratios =  2.6722023344107E-04
mjlosch commented 1 year ago

Once #748 is merged, only these experiments with RMS values of >1e-4 remain:

lab_sea
OpenAD.oad.ggl90
global_ocean.cs32x15.seaice
global_ocean.cs32x15.seaice_dynmix

These involve either lab_sea or ggl90, two packages with potentially inaccurate gradients.

For the OpenAD.oad.ggl90 experiment, I can reduce the gradient error RMS value to 9e-5 when I use mxlMaxFlag=1 (and even to 9e-6 after adjusting grdchk_eps=1.e-4) with TAF-AD, which is consistent with previous experience that the mxlMaxFlag=2 setting is even unstable and leads to blow-up of the AD-model. With OpenAD code, the RMS value remains high, so here the OpenAD code may have a problem. Maybe we should add a ggl90-check for TAF-AD code.

The RMS value for global_ocean.cs32x15.seaice_dynmix can be reduced a little to 8e-4 by using grdchk_eps=1e-1, but at the cost of increasing the RMS value of the main experiment to >1e-4, so that's probably not a good solution.