cerfacs-globc / icclim

icclim: Python library for climate indices and climate indicators calculation.
https://icclim.readthedocs.io/en/latest/
Apache License 2.0
80 stars 32 forks source link

BUG: RuntimeWarning: All-NaN slice encountered #226

Open pagecp opened 1 year ago

pagecp commented 1 year ago

Description

Sometimes when calculating, I get a warning on All-NaN slice encountered. After it gets Killed but I cannot confirm it is related, since what is raised is only a Warning. Further investigation will be done by running manually on the input file.

Sorry for the poor GitHub issue information included here... it is run in batch so I have less information until I run it manually.

Minimal reproducible example

/home/jovyan/work/data/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1-HR/ssp585/r1i1p1f2/CSU/gr/v20191202/CSU_day_CNRM-CM6-1-HR_ssp585_r1i1p1f2_gr_20650101-21001231.nc Processing CSU and creating /home/jovyan/work/data/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1-HR/ssp585/r1i1p1f2/CSU/gr/v20191202/CSU_day_CNRM-CM6-1-HR_ssp585_r1i1p1f2_gr_20650101-21001231.nc

Output received

2022-10-13 12:23:43,995 Calculating climate index: CSU /opt/conda/lib/python3.9/site-packages/dask/array/reductions.py:608: RuntimeWarning: All-NaN slice encountered return np.nanmax(x_chunk, axis=axis, keepdims=keepdims) Killed

bzah commented 1 year ago

I don't think the warning is related to the process being killed. The "all nan sliced encoutered" is displayed by numpy when np.nanmax (or other nan-related functions) is computed over an array made of only nans. See:

>>> np.nanmax(np.asanyarray([np.nan]))
<ipython-input-173-445f05e52077>:1: RuntimeWarning: All-NaN slice encountered
  np.nanmax(np.asanyarray([np.nan]))
Out[173]: nan

I haven't played too much with dask recently, but I guess it's memory related. You may try to feed dask fewer workers and/or fewer threads per worker or give a larger memory pool. Careful though, on a LocalCluster/Client the pool is per worker, so make sure you have n_worker * mem_pool memory available.

Also, I don't know on which machine you run that but if it's one from Cerfacs it could also be due to the file creation issue that Laurent talked about earlier. Dask can create quite a few files especially when the memory is limited.

pagecp commented 1 year ago

Yes, it may be memory leak for long running scripts. It runs on CMCC cluster. I will investigate and we can close the Issue for now, I guess, until I have more info.