Open uwefladrich opened 2 years ago
hmm seems to be an issue in the CMOR library, perhaps a mismatch with the table versions...
Hi @goord, is there anything I can try or test? Is it helpful to try and minimise the example to make it easier to reproduce? Or can I check table versions somehow?
Hi @uwefladrich yes sorry, there is something you can do: (i) post the version of ece2cmor3 and make sure the cmor-tables are up-to-date and (ii) run sequentially only the Amon-variables and post the full log output here. I will try to reproduce it this evening.
(i) ece2cmor v1.8.1
I updated the git submodules recursively, but the tables stayed the same, so I assume they are up-to-date.
(ii) l610-ifs-005-20220321151854.log (the *.cmor.log
file is empty)
Strange. In your log you posted, it is the table day that is gives an error when being loaded, so there is some randomness in the loading failures. Also it is remarkable that the log file is empty, while the message from cmor clearly says 'check the log file'.
@uwefladrich could you change line 69 in ece2cmorlib.py
from
cmor.setup(table_dir, cmor_mode, logfile=logname, create_subdirectories=(1 if create_subdirs else 0))
to
cmor.setup(table_dir, cmor_mode, logfile=None, create_subdirectories=(1 if create_subdirs else 0))
and then run the cmorization without specifying a log file, maybe more information will be sent to stderr
?
Strange. In your log you posted, it is the table day that is gives an error when being loaded, [...]
I realise that I have created a Amon-only varlist file, but I haven't used it in the test run. So I will have to repeat it, but haven't had the time today... I will also use your other suggestion.
I made a few more tests. First of all, I tried the log file changes, but it only got me the same messages on stderr instead of the log file.
The I run a couple of tests trying to isolate the table that would cause the issue, tracking things down to fx
. So if I remove ifs/fx
from the varlist, everything works fine.
Note that the error reported in the logs (about the Amon
table) seems to be misleading. Not only is there no problem with Amon
if I remove fx
, but also if I have only fx
in the varlist, the run crashes with the KeyError
without an error in the log file.
So with fx
being a likely candidate for problems, this leads me to think that it could be something that has to do with resolution? This is a cmorisation of EC-Earth3-Veg-LR, has the LR variant had some issues with the fx
cmorisation?
For instance ece2cmor3/resources/b2share-data/fx-sftlf-EC-Earth3-T159.nc
is used.
Do you have the error for all of the fx
? If not, do you know which one of the fx
causes the problem?
It is fx/sftlf
. The other two fx
variables (areacella
and orog
) do not trigger the error.
Thanks for tracking this down @uwefladrich, sftlf is a special variable that requires downloading a file from b2share (there is a download_sftlf
function in ifs2cmor.py
). Maybe the function hangs on the downloading, which somehow causes the cmor library to report a failure to load a table (speculating here). Could you try to debug on your system by inserting some print messages in download_sftlf
to see whether the download needed, whether it is successful etc?
The actual download is done on line 1012, cmor_utils.get_from_b2share(fname, fullpath)
.
Usually on an HPC platform I would recommend (at installation) to run from your ece2cmor3 root directory:
./download-b2share-dataset.sh ${HOME}/cmorize/ece2cmor3/ece2cmor3/resources/b2share-data
which makes all b2share files are downloaded. If the download is the problem, this might solve it.
I re-initiated the download manually, but it didn't get more/new files, so the problem remains. In particular, fx-sftlf-EC-Earth3-T159.nc
is not changed.
@uwefladrich can you put a month or year of data on an FTP server together with the varlist and metadata json files? If it is not a networking problem, it should be reproducible on our hpc (or knmi's)
Hi,
I am getting this error when trying to cmorise an EC-Earth3-Veg-LR AMIP run:
Immediately followed by this error in the ece2cmor log:
When trying to cmorise another leg of the same experiment, I've seen the same error, but with another variable of the same table. Thus, I do not think it is related specifically to
ts
.Any hints what this could be or what I can test?