Open aymeric-spiga opened 5 years ago
I think I found the problem: with my start files, global attributes (as name, description, conventions and timestamp of the file) are missing. And XIOS isn't happy... I'll change again my python scripts and it will be ok!
Very good! Yes XIOS is very picky.
Despite I gave global attributes to my start files, I keep this error message...
OK, I think I have a problem with my setup... I tried to run the model with my modified start files and it didn't work. I tried to run the model with 64 level start files created by my makestart and it didn't work. So, I tried to run the model with @ehouarn 64 level start files (repertory saturn_64) created by his makestart and it didn't work whereas for @ehouarn it did work correctly. I will check I have the same .xml and .def files
I checked my .xml and .def files and I have the same... I checked the version of my code with Ehouarn's and we don't have the same version: Ehouarn : LMDZ = r2004 IOIPSL = r308 ICOSAGCM = r735 ICOSA_LMDZ = r2004
Me: LMDZ = r2004 IOIPSL = r310 ICOSAGCM = r740 ICOSA_LMDZ = r2004 I will try with the version of Ehouarn
Indeed if it does not work either with your start files or @ehouarn's start files, that means that there is a problem with the code. Try with the exact same version as Ehouarn (including the same version of XIOS)
At the same time, this would mean that ICOSAGCM had a problem introduced between revision 740 and 735, any hint about changes that look suspicious?
Actually, it seems to be a problem with server mode. When I run the model with using_server = true in iodef.xlm, I have the error with timestamp reading in start_icosa.nc. But when I run with using_server = false, it works normally (there is just the warming message of issue #9).
That makes sense because this is an XIOS error. We do not allocate a lot of servers because it used to work with 32 levels. Maybe this is no longer true in 64 levels, because the fields are twice larger.
I tried with a 61-level case, with a start generated by makestart
(see https://github.com/aymeric-spiga/dynamico-giant/issues/9#issuecomment-430396484) and it runs without the timestamp problem. I kept using_server
to true. My version of the model is
DYNAMICO --> Revision: 756
PHYSICS --> Revision: 2005
XIOS --> Revision: 1583
I opened a new branch work_61levs
(see https://github.com/aymeric-spiga/dynamico-giant/issues/9#issuecomment-430503000) to document the 61-lev configuration with apparently solved issues #9 and this issue, so that @debbardet and @ehouarn you can try it in exactly the same config: switch to branch work_61levs
and voilà
@debbardet : please have a look to the four most recent commits in this new branch https://github.com/aymeric-spiga/dynamico-giant/commits/work_61levels and see if there is anything different in your own configuration files (apart from the 61 levels) that could explain this timestamp problem. Otherwise maybe this is simply caused by the difference in code version, but I think (I might be wrong) that the problem comes from the configuration
by @debbardet
I tested saturn with 64 levels with my my modified and corrected (by commit e1adc6e) start files and I don't keep the error with bilinearbig, but I have a new error message (obviously!), that is:
0000: GETIN start_file_name = start_icosa 0000: -> info : Impossible to get the packet with timestamp = 0 0000: Available timestamp are : 0000: -> info : 0000: > Error [CConstDataPacketPtr CStoreFilter::getPacket(Time timestamp) const] : In file '/scratch/cnt0027/lmd1167/dbardet/dynamico-giant-64lvls/code/XIOS/src/filter/store_filter.cpp', line 54 -> Impossible to get the packet with timestamp = 0 0000: 0000: application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0 0000: slurmstepd: STEP 5318413.0 ON n1017 CANCELLED AT 2018-10-12T10:26:05 srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: n1017: tasks 2-4,6-8,10-16,20,22-23: Killed srun: Terminating job step 5318413.0
Originally posted by @debbardet in https://github.com/aymeric-spiga/dynamico-giant/issues/7#issuecomment-429269271