ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
216 stars 126 forks source link

Issue writing preprocessed cube for CESM2-WACCM dataset #3738

Open rswamina opened 3 weeks ago

rswamina commented 3 weeks ago

ESMValTool seems to have trouble writing a preprocessed cube after extracting IPCC regions for the CEMS2-WACCM dataset. I don't know if the program hangs or simply is taking too long but the log message stops at a point where the cube is being saved under the preproc directory. I am using the esmvaltool module installed on JASMIN. As far as I can tell there are no typos!

Here is the recipe:

# ESMValTool
# recipe_test_ssp245_daily_pr.yml
---
documentation:
  description: |
    This is a recipe to download data sets from ESGF nodes and extract IPCC regions.

  authors:
    - swaminathan_ranjini

  title: |

    Recipe to download data from ESGF nodes and extract regions.

  maintainer:
    - swaminathan_ranjini

datasets: 

  - {dataset: CESM2-WACCM, project: CMIP6, exp: historical, ensemble: r(1:3)i1p1f1, start_year: 1995, end_year: 2014, grid: gn}

  - {dataset: CESM2-WACCM, project: CMIP6, exp: ssp245, ensemble: r(1:3)i1p1f1, start_year: 2081, end_year: 2100, grid: gn}

preprocessors:
  preproc_extract_region_land_NCA:
    extract_shape:
      shapefile : IPCC-AR6-shapefiles/IPCC-WGI-reference-regions-v4.shp
      decomposed : False
      method : contains
      crop: True
      ids: 
        - 'N.Central-America'
    mask_landsea:
      mask_out : sea

  preproc_extract_region_land_SCA:
    extract_shape:
      shapefile : IPCC-AR6-shapefiles/IPCC-WGI-reference-regions-v4.shp
      decomposed : False
      method : contains
      crop: True
      ids: 
        - 'S.Central-America'
    mask_landsea:
      mask_out : sea

diagnostics:
  day_pr_NCA:
    description: calculate annual means for region
    variables:
      pr:
        preprocessor: preproc_extract_region_land_NCA
        project: CMIP6
        mip: day
    scripts: null

Here are the final lines of the error message. After the last line shown, the program just waits and I don't know why it hangs there. This recipe works for other models.:

2024-08-16 10:48:27,951 UTC [30002] DEBUG   Running preprocessor function 'save' on the data
[<iris 'Cube' of precipitation_flux / (kg m-2 s-1) (time: 7300; latitude: 20; longitude: 29)>]
loaded from original input file(s)
[LocalFile('/badc/cmip6/data/CMIP6/CMIP/NCAR/CESM2-WACCM/historical/r1i1p1f1/day/pr/gn/v20190415/pr_day_CESM2-WACCM_historical_r1i1p1f1_gn_19900101-19991231.nc'),
 LocalFile('/badc/cmip6/data/CMIP6/CMIP/NCAR/CESM2-WACCM/historical/r1i1p1f1/day/pr/gn/v20190415/pr_day_CESM2-WACCM_historical_r1i1p1f1_gn_20000101-20091231.nc'),
 LocalFile('/badc/cmip6/data/CMIP6/CMIP/NCAR/CESM2-WACCM/historical/r1i1p1f1/day/pr/gn/v20190415/pr_day_CESM2-WACCM_historical_r1i1p1f1_gn_20100101-20150101.nc'),
 LocalFile('/badc/cmip6/data/CMIP6/CMIP/NCAR/CESM2-WACCM/historical/r1i1p1f1/fx/sftlf/gn/v20190227/sftlf_fx_CESM2-WACCM_historical_r1i1p1f1_gn.nc')]
with function argument(s)
compress = False,
filename = PosixPath('/work/scratch-nopw2/ranjinis/hot-models-2/recipe_test_ssp245_daily_pr_20240816_104803/preproc/day_pr_CIM/pr/CMIP6_CESM2-WACCM_day_historical_r1i1p1f1_pr_gn_1995-2014.nc')
rswamina commented 3 weeks ago

@valeriupredoi - can you please look into what I might be doing wrong when you get a chance? Thanks!

valeriupredoi commented 3 weeks ago

@rswamina could you please post the entire trace or the debug file, please? Not much I can gauge from that bit of output :grin:

rswamina commented 3 weeks ago

Yes, of course..will try to attache the whole main_log_debug.txt file attached above. Let me know if you cannot access it. main_log_debug.txt

bouweandela commented 2 weeks ago

If all preprocessor functions are lazy, the save step is where the variable (in this case pr) data is loaded from disk, computations happen, and the result is written to the output file. Are you sure nothing is happening? How long did you wait for?

The timestamps in the attached debug log do not match those in the top post.

rswamina commented 2 weeks ago

I reran it to generate a fresh debug file. Please consider only the timestamps in the file. I waited around 30 minutes and then killed the job. How long should I wait for?

rswamina commented 2 weeks ago

I should add that I tried this several times wondering if there was an issue with JASMIN. The longest I waited (not sure if it was this particular run) was 30 minutes.

bouweandela commented 2 weeks ago

I tried running the recipe on my laptop with just the first dataset

  - {dataset: CESM2-WACCM, project: CMIP6, exp: historical, ensemble: r1i1p1f1, start_year: 1995, end_year: 2014, grid: gn}

It completes in about 6 seconds and uses 2GB of RAM with ESMValCore v2.11.0 and 9 seconds / 4GB of RAM with v2.10.0. Maybe it is something specific to Jasmin indeed. Can you access the files that are listed in the top post, e.g. with ncdump?

schlunma commented 2 weeks ago

I had a similar problem with other data that originated from using dask distributed with the mask_landsea data. This will be fixed soon (https://github.com/ESMValGroup/ESMValCore/pull/2515).

bouweandela commented 2 weeks ago

From the debug log it looks like @rswamina is using the default scheduler (so no distributed).

rswamina commented 2 days ago

I am using the default scheduler. On JASMIN, SSP245 has just the r1 ensemble member's data. It seems to hang for both historical and ssp245 experiments. I tried an ncdump on the files and am able to see the file content under the /badc/ archive path. This was not an issue for other models. I cannot think of what else could be an issue unless someone else can reproduce this on JASMIN. I will also add that I have successfully been able to process historical data for this model a few months ago.

valeriupredoi commented 2 days ago

Am travelling back to the UK today, will have a look tommz or on Thu, before I fo that though - Ranjini, could you pls confirm there is enough room on the disk you writing to, and what iris and esmvalcore you using? Cheers 🍺

rswamina commented 2 days ago

Thanks @valeriupredoi. I have enough space on disk. I am using the esmvaltool module installed on JASMIN for this, here are the version details:

>esmvaltool version
ERROR 1: PROJ: proj_create_from_database: Open of /apps/jasmin/community/esmvaltool/miniconda3_py311_23.11.0-2/envs/esmvaltool/share/proj failed
ESMValCore: 2.10.0
ESMValTool: 2.10.0

I am not sure how to check the iris version though.

valeriupredoi commented 2 days ago

Thanks Ranjini! I know how to do that 😁 Speaking of, I should install 2.11 rather sooner than later