EC-Earth / ece2cmor3

Post-processing and cmorization of ec-earth output
Apache License 2.0
13 stars 6 forks source link

KeyError when cmorising EC-Earth3-Veg-LR AMIP run #733

Open uwefladrich opened 2 years ago

uwefladrich commented 2 years ago

Hi,

I am getting this error when trying to cmorise an EC-Earth3-Veg-LR AMIP run:

Traceback (most recent call last):
    File "[...]/.conda/envs/ece2cmor/bin/ece2cmor", line 11, in <module>
        load_entry_point('ece2cmor3==1.8.1', 'console_scripts', 'ece2cmor')()
    File "[...]/.conda/envs/ece2cmor/lib/python2.7/site-packages/ece2cmor3-1.8.1-py2.7.egg/ece2cmor3/
        cdothreads=args.ncdo)
    File "[...]/.conda/envs/ece2cmor/lib/python2.7/site-packages/ece2cmor3-1.8.1-py2.7.egg/ece2cmor3/
        ifs2cmor.execute(ifs_tasks, nthreads=taskthreads)
    File "[...]/.conda/envs/ece2cmor/lib/python2.7/site-packages/ece2cmor3-1.8.1-py2.7.egg/ece2cmor3/
        pool.map(cmor_worker, proctasks)
    File "[...]/.conda/envs/ece2cmor/lib/python2.7/multiprocessing/pool.py", line 253, in map
        return self.map_async(func, iterable, chunksize).get()
    File "[...]/.conda/envs/ece2cmor/lib/python2.7/multiprocessing/pool.py", line 572, in get
        raise self._value
    KeyError: 'lat_bnds'

Immediately followed by this error in the ece2cmor log:

ERROR:ece2cmor3.ifs2cmor: CMOR failed to load table Amon, the following variable will be skipped: ts. Reason: Problem with 'cmor.load_table'. Please check the logfile (if defined).

When trying to cmorise another leg of the same experiment, I've seen the same error, but with another variable of the same table. Thus, I do not think it is related specifically to ts.

Any hints what this could be or what I can test?

goord commented 2 years ago

hmm seems to be an issue in the CMOR library, perhaps a mismatch with the table versions...

uwefladrich commented 2 years ago

Hi @goord, is there anything I can try or test? Is it helpful to try and minimise the example to make it easier to reproduce? Or can I check table versions somehow?

goord commented 2 years ago

Hi @uwefladrich yes sorry, there is something you can do: (i) post the version of ece2cmor3 and make sure the cmor-tables are up-to-date and (ii) run sequentially only the Amon-variables and post the full log output here. I will try to reproduce it this evening.

uwefladrich commented 2 years ago

(i) ece2cmor v1.8.1 I updated the git submodules recursively, but the tables stayed the same, so I assume they are up-to-date.

(ii) l610-ifs-005-20220321151854.log (the *.cmor.log file is empty)

goord commented 2 years ago

Strange. In your log you posted, it is the table day that is gives an error when being loaded, so there is some randomness in the loading failures. Also it is remarkable that the log file is empty, while the message from cmor clearly says 'check the log file'.

goord commented 2 years ago

@uwefladrich could you change line 69 in ece2cmorlib.py from

cmor.setup(table_dir, cmor_mode, logfile=logname, create_subdirectories=(1 if create_subdirs else 0))

to

cmor.setup(table_dir, cmor_mode, logfile=None, create_subdirectories=(1 if create_subdirs else 0))

and then run the cmorization without specifying a log file, maybe more information will be sent to stderr?

uwefladrich commented 2 years ago

Strange. In your log you posted, it is the table day that is gives an error when being loaded, [...]

I realise that I have created a Amon-only varlist file, but I haven't used it in the test run. So I will have to repeat it, but haven't had the time today... I will also use your other suggestion.

uwefladrich commented 2 years ago

I made a few more tests. First of all, I tried the log file changes, but it only got me the same messages on stderr instead of the log file.

The I run a couple of tests trying to isolate the table that would cause the issue, tracking things down to fx. So if I remove ifs/fx from the varlist, everything works fine.

Note that the error reported in the logs (about the Amon table) seems to be misleading. Not only is there no problem with Amon if I remove fx, but also if I have only fx in the varlist, the run crashes with the KeyError without an error in the log file.

So with fx being a likely candidate for problems, this leads me to think that it could be something that has to do with resolution? This is a cmorisation of EC-Earth3-Veg-LR, has the LR variant had some issues with the fx cmorisation?

treerink commented 2 years ago

For instance ece2cmor3/resources/b2share-data/fx-sftlf-EC-Earth3-T159.nc is used.

Do you have the error for all of the fx? If not, do you know which one of the fx causes the problem?

uwefladrich commented 2 years ago

It is fx/sftlf. The other two fx variables (areacella and orog) do not trigger the error.

goord commented 2 years ago

Thanks for tracking this down @uwefladrich, sftlf is a special variable that requires downloading a file from b2share (there is a download_sftlf function in ifs2cmor.py). Maybe the function hangs on the downloading, which somehow causes the cmor library to report a failure to load a table (speculating here). Could you try to debug on your system by inserting some print messages in download_sftlf to see whether the download needed, whether it is successful etc?

The actual download is done on line 1012, cmor_utils.get_from_b2share(fname, fullpath).

treerink commented 2 years ago

Usually on an HPC platform I would recommend (at installation) to run from your ece2cmor3 root directory: ./download-b2share-dataset.sh ${HOME}/cmorize/ece2cmor3/ece2cmor3/resources/b2share-data which makes all b2share files are downloaded. If the download is the problem, this might solve it.

uwefladrich commented 2 years ago

I re-initiated the download manually, but it didn't get more/new files, so the problem remains. In particular, fx-sftlf-EC-Earth3-T159.nc is not changed.

goord commented 2 years ago

@uwefladrich can you put a month or year of data on an FTP server together with the varlist and metadata json files? If it is not a networking problem, it should be reproducible on our hpc (or knmi's)