aymeric-spiga / dynamico-giant

0 stars 2 forks source link

timestamp problem #10

Open aymeric-spiga opened 5 years ago

aymeric-spiga commented 5 years ago

by @debbardet

I tested saturn with 64 levels with my my modified and corrected (by commit e1adc6e) start files and I don't keep the error with bilinearbig, but I have a new error message (obviously!), that is:

0000: GETIN start_file_name = start_icosa 0000: -> info : Impossible to get the packet with timestamp = 0 0000: Available timestamp are : 0000: -> info : 0000: > Error [CConstDataPacketPtr CStoreFilter::getPacket(Time timestamp) const] : In file '/scratch/cnt0027/lmd1167/dbardet/dynamico-giant-64lvls/code/XIOS/src/filter/store_filter.cpp', line 54 -> Impossible to get the packet with timestamp = 0 0000: 0000: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0 0000: slurmstepd: STEP 5318413.0 ON n1017 CANCELLED AT 2018-10-12T10:26:05 srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: n1017: tasks 2-4,6-8,10-16,20,22-23: Killed srun: Terminating job step 5318413.0

Originally posted by @debbardet in https://github.com/aymeric-spiga/dynamico-giant/issues/7#issuecomment-429269271

debbardet commented 5 years ago

I think I found the problem: with my start files, global attributes (as name, description, conventions and timestamp of the file) are missing. And XIOS isn't happy... I'll change again my python scripts and it will be ok!

aymeric-spiga commented 5 years ago

Very good! Yes XIOS is very picky.

debbardet commented 5 years ago

Despite I gave global attributes to my start files, I keep this error message...

debbardet commented 5 years ago

OK, I think I have a problem with my setup... I tried to run the model with my modified start files and it didn't work. I tried to run the model with 64 level start files created by my makestart and it didn't work. So, I tried to run the model with @ehouarn 64 level start files (repertory saturn_64) created by his makestart and it didn't work whereas for @ehouarn it did work correctly. I will check I have the same .xml and .def files

debbardet commented 5 years ago

I checked my .xml and .def files and I have the same... I checked the version of my code with Ehouarn's and we don't have the same version: Ehouarn : LMDZ = r2004 IOIPSL = r308 ICOSAGCM = r735 ICOSA_LMDZ = r2004

Me: LMDZ = r2004 IOIPSL = r310 ICOSAGCM = r740 ICOSA_LMDZ = r2004 I will try with the version of Ehouarn

aymeric-spiga commented 5 years ago

Indeed if it does not work either with your start files or @ehouarn's start files, that means that there is a problem with the code. Try with the exact same version as Ehouarn (including the same version of XIOS)

At the same time, this would mean that ICOSAGCM had a problem introduced between revision 740 and 735, any hint about changes that look suspicious?

debbardet commented 5 years ago

Actually, it seems to be a problem with server mode. When I run the model with using_server = true in iodef.xlm, I have the error with timestamp reading in start_icosa.nc. But when I run with using_server = false, it works normally (there is just the warming message of issue #9).

aymeric-spiga commented 5 years ago

That makes sense because this is an XIOS error. We do not allocate a lot of servers because it used to work with 32 levels. Maybe this is no longer true in 64 levels, because the fields are twice larger.

aymeric-spiga commented 5 years ago

I tried with a 61-level case, with a start generated by makestart (see https://github.com/aymeric-spiga/dynamico-giant/issues/9#issuecomment-430396484) and it runs without the timestamp problem. I kept using_server to true. My version of the model is

DYNAMICO --> Revision: 756
PHYSICS --> Revision: 2005
XIOS --> Revision: 1583
aymeric-spiga commented 5 years ago

I opened a new branch work_61levs (see https://github.com/aymeric-spiga/dynamico-giant/issues/9#issuecomment-430503000) to document the 61-lev configuration with apparently solved issues #9 and this issue, so that @debbardet and @ehouarn you can try it in exactly the same config: switch to branch work_61levs and voilà

aymeric-spiga commented 5 years ago

@debbardet : please have a look to the four most recent commits in this new branch https://github.com/aymeric-spiga/dynamico-giant/commits/work_61levels and see if there is anything different in your own configuration files (apart from the 61 levels) that could explain this timestamp problem. Otherwise maybe this is simply caused by the difference in code version, but I think (I might be wrong) that the problem comes from the configuration