aymeric-spiga / dynamico-giant

0 stars 2 forks source link

Problem in bilinearbig in 64 levels #7

Closed aymeric-spiga closed 5 years ago

aymeric-spiga commented 5 years ago

Yes with version 1984 the problem is still 000: forrtl: severe (408): fort: (3): Subscript #2 of the array F2D_ARR has value -858993460 which is less than the lower bound of 1 000: 000: Image PC Routine Line Source
000: icosa_lmdz.exe 0000000002B95E66 Unknown Unknown Unknown 000: icosalmdz.exe 00000000014F1DBB bilinearbig 95 bilinearbig.f90 000: icosalmdz.exe 00000000014C16C4 interpolateh2h2 125 interpolateH2H2.f90 000: icosalmdz.exe 00000000012B419A optcv 193 optcv.f90 000: icosalmdz.exe 0000000001097C99 callcorrk 812 callcorrk.f90 000: icosa_lmdz.exe 0000000000DB955E physiq_mod_mp_phy 825 physiq_mod.f90

Originally posted by @ehouarn in https://github.com/aymeric-spiga/dynamico-giant/issues/4#issuecomment-425438667

aymeric-spiga commented 5 years ago

This was discussed a bit in #4 See e.g. https://github.com/aymeric-spiga/dynamico-giant/issues/4#issuecomment-425398379 https://github.com/aymeric-spiga/dynamico-giant/issues/4#issuecomment-425399328

aymeric-spiga commented 5 years ago

@ehouarn This is a problem when you try with 64 levels Works like a charm for 32 levels

aymeric-spiga commented 5 years ago

Also the present version r2004 of LMDZ.GENERIC works well with 64 levels in 1D runs (saturn1d)

aymeric-spiga commented 5 years ago

@debbardet do you still have the problem in 3D with 64 levels?

debbardet commented 5 years ago

I just launched a job to check it: I test r1984 and then r2004

aymeric-spiga commented 5 years ago

OK maybe revision 2004 fix the problem. We will know quickly with your tests!

debbardet commented 5 years ago

I keep this problem with both.

aymeric-spiga commented 5 years ago

OK, so let us summarize : on my side it works in 3D with 32 levels and r2004 for physics (except that after 80 iterations the model has a XIOS problem, but this is another issue discussed in #8) and on your side it does not work in 3D with 64 levels. So this is a problem related to running in 64 levels, and possibly to the start "tweaking".

(To be sure that there is no problem with your installation, please try with 32 levels and run the model in makestart. The problem with F2D_ARR should not appear)

aymeric-spiga commented 5 years ago

To summarize here our latest tests with version 2004 LMDZ.GENERIC @debbardet and @aymeric-spiga had it working in 32 levels for Saturn @alboiss had it working in 32 levels for Jupiter So this is for sure a problem with 64 levels

debbardet commented 5 years ago

Ok, I tested the model with 64 levels:

aymeric-spiga commented 5 years ago

OK great! so let us close this issue and I opened a new one #10 for the new problem

ehouarn commented 5 years ago

Final word on the 000: forrtl: severe (408): fort: (3): Subscript #2 of the array F2D_ARR has value -858993460 which is less than the lower bound of 1 issue: This error is due to the fact that the dynamics was sending NaN fields (temperature at the very least) to the physics and is thus a problem with the initial conditions file that was used by the dynamics.

aymeric-spiga commented 5 years ago

Did the dynamics send NaN fields because the model went numerically unstable? Probably, isn't it?

debbardet commented 5 years ago

Using the using_server mode false and trying to run the model with my modified start set up on 61-levels discretization, I have one more time this error in bilinearbig. The model ran 1 hour (in debug mode) before being cancelled. While, with the start files created by makestart, everything is good.

PS: I changed to 61 levels because I thought 64 levels config was too much and so, maybe unstable.

aymeric-spiga commented 5 years ago

I think it should work with using_server = .false. but that is a detail. You were right to turn to 61 levels because 64 levels had the problem mentioned in issue #9

Now, your simulation works very well. I mean this is not a bug, it is able to run some iterations before crashing. Then at some point, you have NaN in the dynamics because you get numerically unstable. So I think there is no bug, it is simply that the model gets numerically unstable. Could you try and look to the dynamical fields to diagnose the problem?

You don't have numerical instability in the makestart case because winds are zero and the temperature profile is simple (i.e. it is not a tweaked temperature field as in your case). This is a case less prone to numerical instability I think.

debbardet commented 5 years ago

The model could write one time step only, and I already have NaN values in ps, temperature, u, v and omega fields maybe it is a problem in pressure's calculation because in my start files, ps has numerical values (it didn't change between your restart file and my start file).

aymeric-spiga commented 5 years ago

Hum. Looks like some serious issues are going on with some fundamental atmospheric equilibrium. Let us talk about that face-to-face to investigate this

debbardet commented 5 years ago

Effectively, it was an atmospheric equilibrium issue: at first time steps, model worked correctly but after two physical calls, it produced NaN values for DYN diagnostic. This is it which produce a segmentation fault in bilinearbig (because it try to calculate an index with a NaN value...) So I tested several cases to create best start files:

debbardet commented 5 years ago

The problem with the time of calculation is the time used by the model to write in icosa_lmdz.out: I selected level_info=100 and print_file=false. To keep a big info_level, we have to use print_file mode because it is faster to create a file per processor rather each processor write in the same file.