Closed nierad closed 3 years ago
hmm this may be an error originating from the gridcell area changes I did... Lars could you run with a single thread (--npp 1) and post the log?
Hi @goord ,
here it is:
Processing osm leg 002 of experiment LM02
Traceback (most recent call last):
File "/home/x_larni/miniconda2/envs/ece2cmor3/bin/ece2cmor", line 11, in
Thanks, Lars
Hi Lars can you post the entire log output as well? I just pushed a bug fix but I'm not sure this will solve the underlying issue.
Hi Gijs,
files are attached. LM02-osm-002-20200330161110.log LM02-osm-002-20200330161110.cmor.log
Hi Lars there is something wrong with your mrsol
cmorization. Can you run your cmorization again with the environment variable
export ECE2CMOR3_IFS_CLEANUP=false
and after that inspect the file mrsol_Emon.nc
and 42.128.111_41.128.111_40.128.111_39.128.111.3
. Perhaps they are still there in the directory /tmp/cmorize/LM02/ifs/002/LM02-ifs-1851/
? Ece2cmor3 reports it can't find any time stamps in those files, which means they are either incorrectly filtered somehow or missing in the grib files?
Hi Gijs,
the mrsol_Emon.nc doesn't exist. I have converted the 42.128.111_41.128.111_40.128.111_39.128.111.3
-file to nc using
cdo -f nc copy 42...11.3 mrsol.nc
which throws some warnings:
[x_larni@tetralith2 LM02-ifs-1851]$ cdo -f nc copy 42.128.111_41.128.111_40.128.111_39.128.111.3 mrsol.nc Warning (cgribexScanTimestep1) : Record 5 (id=39.128 lev1=0 lev2=7) timestep 1: Inconsistent verification time! Warning (cdfDefRgrid) : Creating a NetCDF file with data on a gaussian reduced grid. Warning (cdfDefRgrid) : The further processing of the resulting file is unsupported! cdo copy: Processed 1037627840 values from 4 variables over 2919 timesteps ( 10.62s 22MB )
There is no time-dimension. Could this have gotten lost in the cmorization process? I think it's really strange because it only happens in this very leg.
I have uploaded the two files to: http://stormbringer.nateko.lu.se/public/lars/cmor_issue612/
Hi Lars, I can't find what's wrong with that filtered grib file, but for some reason cdo doesn't handle it correctly, it skippes the 2nd time step (1851-01-01 3 o'clock).
Hi is it possible that there is a hole in the data? The OSM output is netcdf, which was converted to grib trying to be as close as possible to the ifs output.
@nierad can you upload the raw grib files for the first 2 legs?
Hi @etiennesky I don't have enough space to upload all of the files. Which ones do you need? The zipped ones or the ICMGG????? files? Do you need the full year?
Sorry, miscalculated. I'll upload it!
http://stormbringer.nateko.lu.se/public/lars/cmor_issue612/leg1.tar http://stormbringer.nateko.lu.se/public/lars/cmor_issue612/leg2.tar
ok @nierad just read your (edited) message, downloading them now. (will take a bit).
@nierad I cannot see anything wrong with the files, but it might be possible that the problem is due to duplicates since you linked all the files as such ?
ICMGGLM02+185001 -> ICMGG2ctd+185001
@etiennesky I don't think so. We actually reran these legs because of this (and because of file I destroyed somewhere else). It is the same for the first 10 legs but it only happens in leg 2.
Lars, what happens if you run
grib_copy -w,paramId=39/40/41/42 ICMGGLM02+185001 mrsol.grib
Is the output grib readable by cdo?
Hi Gijs,
it works. When I then do
cdo -f nc copy mrsol.grib mrsolfrmrsol.nc
the resulting nc-files look ok.
ok but that file you created should be the same as your file 42.128.111_41.128.111_40.128.111_39.128.111.3
that you put on the ftp. So could you somehow diff them?
This is how they differ externally:
740154168 Apr 1 16:32 42.128.111_41.128.111_40.128.111_39.128.111.3
62823360 Apr 3 09:40 mrsol.grib
O wait I'm sorry the grib file you created is just for one month, you need to do the filter for all gribs and then concatenate. On the other hand, cdo was complaining about the timestamp of record 5 so you may also filter out januari of the ece2cmor3-filtered grib and compare...
This is how they differ externally:
740154168 Apr 1 16:32 42.128.111_41.128.111_40.128.111_39.128.111.3
739694400 Apr 3 10:25 mrsol1851.grb
after cdo -> nc:
2844948 Apr 1 16:43 42.128.111_41.128.111_40.128.111_39.128.111.3.nc
4150536908 Apr 3 10:26 mrsolfrmrsol1851.nc
How can I filter out January?
When I do a
cdo -f nc selmon,1
on the two grib-files I get no time-dim from the 42.128...-file while the one from mrsol1851.grb is ok.
Hi Lars I think I found something. When I run
grib_ls -w dataDate=18510101 -p,dataDate,dataTime,paraId,typeOfLevel,levels 42.128.111_41.128.111_40.128.111_39.128.111.3
I get
dataDate dataTime paraId typeOfLevel levels
18510101 0 not_found depthBelowLandLayer 0-188
18510101 0 not_found depthBelowLandLayer 188-240
18510101 0 not_found depthBelowLandLayer 240-16
18510101 0 not_found depthBelowLandLayer 16-156
18510101 300 not_found depthBelowLandLayer 0-7
18510101 300 not_found depthBelowLandLayer 7-28
18510101 300 not_found depthBelowLandLayer 28-100
18510101 300 not_found depthBelowLandLayer 100-MISSING
18510101 600 not_found depthBelowLandLayer 0-7
18510101 600 not_found depthBelowLandLayer 7-28
18510101 600 not_found depthBelowLandLayer 28-100
18510101 600 not_found depthBelowLandLayer 100-MISSING
18510101 900 not_found depthBelowLandLayer 0-7
18510101 900 not_found depthBelowLandLayer 7-28
18510101 900 not_found depthBelowLandLayer 28-100
18510101 900 not_found depthBelowLandLayer 100-MISSING
18510101 1200 not_found depthBelowLandLayer 0-7
18510101 1200 not_found depthBelowLandLayer 7-28
18510101 1200 not_found depthBelowLandLayer 28-100
18510101 1200 not_found depthBelowLandLayer 100-MISSING
18510101 1500 not_found depthBelowLandLayer 0-7
18510101 1500 not_found depthBelowLandLayer 7-28
18510101 1500 not_found depthBelowLandLayer 28-100
18510101 1500 not_found depthBelowLandLayer 100-MISSING
18510101 1800 not_found depthBelowLandLayer 0-7
18510101 1800 not_found depthBelowLandLayer 7-28
18510101 1800 not_found depthBelowLandLayer 28-100
18510101 1800 not_found depthBelowLandLayer 100-MISSING
18510101 2100 not_found depthBelowLandLayer 0-7
18510101 2100 not_found depthBelowLandLayer 7-28
18510101 2100 not_found depthBelowLandLayer 28-100
18510101 2100 not_found depthBelowLandLayer 100-MISSING
32 of 11680 messages in 42.128.111_41.128.111_40.128.111_39.128.111.3
As you can see, the first time point has different levels! I believe this may cause cdo to get confused and consequently report incorrect timestamps
Can you confirm that these unknown levels are there in the original gridpoint grib file?
This might be because those variables have a strange post-processing when converting from netcdf to grib in liblsm.sh function osm_post_output() :
## add depthBelowLandLayer to soil moisture/temp to conform to IFS standard (needed for lpjg_forcing)
--
${grib_set} -w paramId=39/139 -s indicatorOfTypeOfLevel=112,topLevel=0,bottomLevel=7 $FGOUT temp ; mv temp $FGOUT
${grib_set} -w paramId=40/170 -s indicatorOfTypeOfLevel=112,topLevel=7,bottomLevel=28 $FGOUT temp ; mv temp $FGOUT
${grib_set} -w paramId=41/183 -s indicatorOfTypeOfLevel=112,topLevel=28,bottomLevel=100 $FGOUT temp ; mv temp $FGOUT
${grib_set} -w paramId=42/236 -s indicatorOfTypeOfLevel=112,topLevel=100,bottomLevel=missing $FGOUT temp ; mv temp $FGOUT
This happens only in the first month of the second year @nierad ? The other subsequent years are fine?
@etiennesky : Yes, that's right. Only year 2 (1851).
Looking at the raw data @nierad made available, I don't see anything wrong with it. If I filter the codes 39/40/41/42 I am able to join 185012 with 185101 and cdo doesn't complain.
The problem lies in the filtering somehow.
Mysterious. Does the issue occur over and over again? I'm downloading the leg and will debug the filtering. Lars which varlist are you using, are you able to post it perhaps?
Hi Gijs, I put the file to our ftp-server: http://stormbringer.nateko.lu.se/public/lars/cmor_issue612/cmip6-data-request-varlist-LS3MIP-land-hist-EC-EARTH-Veg.json
Hi. Sorry, I am missing the cmorised file for 1850, too. I don't know now if this has been missing from the beginning or just since I rerun it. Could that be the issue? @etiennesky:
Looking at the raw data @nierad made available, I don't see anything wrong with it. If I filter the codes 39/40/41/42 I am able to join 185012 with 185101 and cdo doesn't complain.
Does that mean that the bug could be in the 1850-files, too? I will rerun it when I am done with another cmorization (which uses v1.2).
Hi. Sorry, I am missing the cmorised file for 1850, too. I don't know now if this has been missing from the beginning or just since I rerun it. Could that be the issue?
would have to see the 3rd year (just the first month) to know for sure.
@etiennesky:
Looking at the raw data @nierad made available, I don't see anything wrong with it. If I filter the codes 39/40/41/42 I am able to join 185012 with 185101 and cdo doesn't complain.
Does that mean that the bug could be in the 1850-files, too? I will rerun it when I am done with another cmorization (which uses v1.2).
since the bug seems to happen on 185101010300 I thought it might be because of some discontinuity in the output, which doesn't seem to be the case
The 3rd years cmorized file looks fine. Which file do you want to look at @etiennesky?
The 3rd years cmorized file looks fine. Which file do you want to look at @etiennesky?
the ICMGG file fr the first month of the 3rd leg - ICMGG2ctd+185201
Ok, I uploaded it too. http://stormbringer.nateko.lu.se/public/lars/cmor_issue612/ICMGG2ctd+185201
Ok, this happens when cmorizing leg 001:
Processing osm leg 001 of experiment LM02
Traceback (most recent call last):
File "/home/x_larni/miniconda2/envs/ece2cmor3/bin/ece2cmor", line 11, in
ok, will fix that
Well I can't say that anything in the data provided is suspicious, all years have similar structure.
There might be an issue because the first year does not have data on Jan 1st at 00h, it starts at 03h. That is a feature of the OSM I am afraid.
Thanks all for helping. Another Observation: A similar run with different Landuse-settings ("LM03", net-transitions) did go through smoothly, i.e. no problems in any year. Same varlist, same metadata-file.
The feature that jan. 1st 0hr fields are missing seems to causing random crashes and issues with the hfdsn in the LImon table. Lars can you check those files, do they have the correct end time each year (yyyy12)?
Hi Gijs,
I am not sure if I understood correctly, but I checked the first 10 years of hfdsn using:
cdo -f nc copy $x/ICMGGLM02+????12 ${x}_12.nc
and they all have the same time-dimension, i.e. starting at 0 and ending in 741.
Also, re "feature" : I understand Etienne's comment such that Jan 1st 0:00 values are only missing for the first year of the first leg.
Correction to my earlier post
Another Observation: A similar run with different Landuse-settings ("LM03", net-transitions) did go through smoothly, i.e. no problems in any year. Same varlist, same metadata-file.
There are two years (1962,1982) that have crashed with the same error as 6 posts earlier.
Can you confirm that these unknown levels are there in the original gridpoint grib file?
Damn, I must have missed this comment. This could be a hind! If do a
for xx in {1850..1858}; do x=${xx}01; grib_ls -w dataDate=${x}01 -p,dataDate,dataTime,paraId,typeOfLevel,levels /proj/s-cmip/users/lu/lpjg/gswp3/osm_output/LM02/output/osm/00?/ICMGGLM02+${x}; done
I get this for all years (only first lines printed here)
dataDate dataTime paraId typeOfLevel levels
18510101 300 not_found surface not_found
18510101 300 not_found surface not_found
18510101 300 not_found surface not_found
18510101 300 not_found surface not_found
18510101 300 not_found surface not_found
18510101 300 not_found surface not_found
18510101 300 not_found surface not_found
18510101 300 not_found surface not_found
18510101 300 not_found depthBelowLandLayer 0-7
18510101 300 not_found depthBelowLandLayer 7-28
18510101 300 not_found depthBelowLandLayer 28-100
18510101 300 not_found depthBelowLandLayer 100-MISSING
18510101 300 not_found surface not_found
...
Note, there is no timestep 0. In none of the grib-files. I hope, this helps.
P.S.: How can I make it scroll inside the box? Edit: Ok, seems to work after submitting;)
Seems solved, closing. Reopen if necessary.
Hi all, hope you're all still well!
I have an issue cmorizing one leg of an offline run. All other legs are fine. We have already rerun the leg, results are identical. It is an OSM run. Here is the python error:
Processing osm leg 002 of experiment LM02 Traceback (most recent call last): File "/home/x_larni/miniconda2/envs/ece2cmor3/bin/ece2cmor", line 11, in
load_entry_point('ece2cmor3==1.3.0', 'console_scripts', 'ece2cmor')()
File "/home/x_larni/miniconda2/envs/ece2cmor3/lib/python2.7/site-packages/ece2cmor3-1.3.0-py2.7.egg/ece2cmor3/ece2cmor.py", line 141, in main
cdothreads=args.ncdo)
File "/home/x_larni/miniconda2/envs/ece2cmor3/lib/python2.7/site-packages/ece2cmor3-1.3.0-py2.7.egg/ece2cmor3/ece2cmorlib.py", line 188, in perform_ifs_tasks
ifs2cmor.execute(ifs_tasks + [area_task], nthreads=taskthreads)
File "/home/x_larni/miniconda2/envs/ece2cmor3/lib/python2.7/site-packages/ece2cmor3-1.3.0-py2.7.egg/ece2cmor3/ifs2cmor.py", line 208, in execute
pool.map(cmor_worker, proctasks)
File "/home/x_larni/miniconda2/envs/ece2cmor3/lib/python2.7/multiprocessing/pool.py", line 253, in map
return self.map_async(func, iterable, chunksize).get()
File "/home/x_larni/miniconda2/envs/ece2cmor3/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
TypeError: 'int' object is not iterable
I am not familiar with either IFS output nor with ifs2cmor very much, so I hope you guys will be able to quickly identify what might be causing this. Furthermore, ece2cmor3 doesn't crash, it just hangs.
Thanks a lot! Have a nice weekend!