esm-tools / esm_runscripts

GNU General Public License v2.0
3 stars 5 forks source link

awiesm tidy: ncwa not found #195

Open ackerlar opened 3 years ago

ackerlar commented 3 years ago

After running the first year of an AWI-ESM run with a cold start for oasis (lresume=False), the tidy job breaks with this error:

cdo seltimestep: Processed 126858 values from 1 variable over 10494 timesteps [8.48s 100MB]
ncwa -O -a time onlyonetimestep.nc notimestep_sst_feom.nc
sh: ncwa: command not found

I am running my jobs on mistral. I am currently trying to reproduce this on ollie. Someone else experience the same issues?

(base) a270124@mlogin102% esm_versions check
+---------------------+-------------+-------------------------------------------------------------------------------------------------+-------------------+----------------------+
| package_name        | version     | file                                                                                            | branch            | tags                 |
|---------------------+-------------+-------------------------------------------------------------------------------------------------+-------------------+----------------------|
| esm_calendar        | 5.0.0       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_database        | 5.0.0       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_environment     | 5.1.3       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_master          | 5.1.6       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_motd            | 5.0.2       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_parser          | 5.1.12      | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   | prep_release      | v5.1.7-20-g82f1424   |
| esm_pism            | 0.0.1.dev15 | /mnt/lustre02/work/ba0989/a270124/glacial-inception/software/github.com/esm-tools/esm_pism      | main              | Error                |
| esm_plugin_manager  | 5.0.1       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_profile         | 5.0.0       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_rcfile          | 5.1.0       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
| esm_runscripts      | 5.1.31      | /mnt/lustre02/work/ba0989/a270124/glacial-inception/software/github.com/ackerlar/esm_runscripts | prep_release      | v5.0.14-114-gc5326ea |
| esm_tools           | 5.1.17      | /mnt/lustre02/work/ba0989/a270124/glacial-inception/software/github.com/ackerlar/esm_tools      | prep_release_wiso | v5.1.10-139-g2685d20 |
| esm_version_checker | 5.1.5       | /mnt/lustre01/pf/a/a270124/.local/lib/python3.9/site-packages                                   |                   |                      |
+---------------------+-------------+-------------------------------------------------------------------------------------------------+-------------------+----------------------+
mandresm commented 3 years ago

Hi @ackerlar , it is likely that the environment is not loading nco. Can you provide the experiment directory?

ackerlar commented 3 years ago

Hi @mandresm , here is a test run /work/ba0989/a270124/PalModII/experiments/test01/scripts where the tidy job did not start automatically (maybe this is the actual issue). when I run the tidy job manually, I get this error:

ncwa -O -a time onlyonetimestep.nc notimestep_sst_feom.nc
sh: ncwa: command not found

When I load nco before running tidy, it seems to work fine.

ackerlar commented 3 years ago

Somehow the output and restart files are not copied from run_*/work/ to the corresponding experiment directories and the tidy job got stuck here:

WARNING: File not found: /work/ba0989/a270124/PalModII/experiments/125ka_awiesm-2.1_corr_jsbach/run_10010101-10011231/work/125ka_awiesm-2.1_corr_jsbach_100101.01_aclcim.nc
2021-10-27 10:56:21.361117
WARNING: File not found: /work/ba0989/a270124/PalModII/experiments/125ka_awiesm-2.1_corr_jsbach/run_10010101-10011231/work/125ka_awiesm-2.1_corr_jsbach_100101.01_glim.nc
2021-10-27 10:56:21.361248
WARNING: File not found: /work/ba0989/a270124/PalModII/experiments/125ka_awiesm-2.1_corr_jsbach/run_10010101-10011231/work/125ka_awiesm-2.1_corr_jsbach_100101.01_jsbid.nc
2021-10-27 10:56:21.361361
WARNING: File not found: /work/ba0989/a270124/PalModII/experiments/125ka_awiesm-2.1_corr_jsbach/run_10010101-10011231/work/125ka_awiesm-2.1_corr_jsbach_100101.01_sp6h.nc
2021-10-27 10:56:21.361479
WARNING: File not found: /work/ba0989/a270124/PalModII/experiments/125ka_awiesm-2.1_corr_jsbach/run_10010101-10011231/work/125ka_awiesm-2.1_corr_jsbach_100101.01_spim.nc
2021-10-27 10:56:21.361588
mandresm commented 3 years ago

Those files do not seem to exist in the work folder, there are some similar files starting with restart_ though. Is this just a conflict on the naming of the restarts?