OGGM / massbalance-sandbox

New generation of OGGM mass-balance models
BSD 3-Clause "New" or "Revised" License
6 stars 11 forks source link

Issues with processing climate data #17

Open JordiBolibar opened 3 years ago

JordiBolibar commented 3 years ago

Hi Lilian,

First of all, thanks a lot for all these tools. They are really exhaustive, I'm sure I'll be able to find a lot of useful stuff for my project. I'm particularly interested in the daily climate processing, temperature scaling and initialization of snow/ice fractions.

I've been following the how_to_daily_input_daily_output.ipynb notebook in order to try a few things. I'm encountering a couple of issues when trying to process daily climate data for a gdir. When I do (bear in mind that the source code here is in Julia, but all the libraries called are in Python), I get the following error:

MBsandbox = pyimport("MBsandbox.mbmod_daily_oneflowline")

PARAMS["hydro_month_nh"]=1
climate="W5E5"
MBsandbox.process_w5e5_data(gdir, climate_type=climate, temporal_resol="daily") 

OSError(-101, 'NetCDF: HDF error')
  File "/Users/Bolib001/Python/oggm/oggm/utils/_workflow.py", line 490, in _entity_task
    out = task_func(gdir, **kwargs)
  File "/Users/Bolib001/Python/massbalance-sandbox/MBsandbox/mbmod_daily_oneflowline.py", line 435, in process_w5e5_data
    with xr.open_dataset(path_tmp) as ds:
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/api.py", line 497, in open_dataset
    backend_ds = backend.open_dataset(
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 551, in open_dataset
    store = NetCDF4DataStore.open(
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 380, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 328, in __init__
    self.format = self.ds.data_model
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 389, in ds
    return self._acquire()
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/netCDF4_.py", line 383, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/file_manager.py", line 187, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "/Users/Bolib001/miniconda3/envs/oggm_env/lib/python3.9/site-packages/xarray/backends/file_manager.py", line 205, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "src/netCDF4/_netCDF4.pyx", line 2307, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 1925, in netCDF4._netCDF4._ensure_nc_success
PyError ($(Expr(:escape, :(ccall(#= /Users/Bolib001/.julia/packages/PyCall/BD546/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'OSError'>
OSError(-101, 'NetCDF: HDF error')

After looking around a little bit, this seems to be quite a common bug, which can be due to different things: either corrupted .nc files or problems with the order of installation of the h5py and netCDF4 libraries. I have tried multiple things with no success so far. I was just wondering if you have already encountered this problem and if you know of any easy fix :)

I also tested MBsandbox.process_era5_daily_data(gdir) as suggested in the notebook, but when it just seems to run forever (I stop it after several minutes running). Is it OK to use this function or am I missing something?

Thanks a lot again!

lilianschuster commented 3 years ago

Hi Jordi,

It is possible that MBsandbox.process_era5_daily_data(gdir) and also MBsandbox.process_w5e5_data take some time to run because when running them for the first time, they first have to download the new (daily) data. So, depending on your internet connection, this can take some time. (e.g. in case of ERA5_daily it is 662MB, note that there only temperature is daily, and for precipitation the monthly estimates are used while for W5E5 e.g., both prcp and temp are daily ... in case of W5E5 it is 2*280MB). So, for testing daily MB, you should rather use W5E5, also because my preprocessed data goes until end of 2019 for W5E5 (and for ERA5 only goes till end of 2018).

You can also check manually inside of the folders whether the data has been correctly downloaded. In my case it is under: /home/lilianschuster/OGGM/download_cache/cluster.klima.uni-bremen.de/~lschuster/w5e5v2.0/flattened/daily (e.g. try to open it with xarray). If the download did not work, you could also manually download the files under this link here and drop them in the right folder. If it still does not work to open it with xarray and you can normally open stuff with xarray, then we know at least a bit better where the problem is.

I hope that helps a bit. This is all work in process, so if you encounter other problems, don't hesitate to ask further. Specifically the sfc type distinction model using a snow ageing bucket system is still not well documented.

JordiBolibar commented 3 years ago

Thanks for the quick reply! Now I'm getting a little bit more of information. The current situation is the following one:

When I run it using MBsandbox.process_w5e5_data(gdir, climate_type=climate, temporal_resol="daily") I get:

2021-09-30 17:46:00: oggm.cfg: PARAMS['hydro_month_nh'] changed from `10` to `1`.
2021-09-30 17:46:00: MBsandbox.mbmod_daily_oneflowline: (RGI60-11.01450) process_w5e5_data
2021-09-30 17:46:00: oggm.utils: No known hash for cluster.klima.uni-bremen.de/~lschuster/w5e5v2.0/flattened/daily/w5e5v2.0_tas_global_daily_flat_glaciers_1979_2019.nc
2021-09-30 17:46:00: oggm.utils: No known hash for cluster.klima.uni-bremen.de/~lschuster/w5e5v2.0/flattened/daily/w5e5v2.0_pr_global_daily_flat_glaciers_1979_2019.nc
2021-09-30 17:46:00: oggm.utils: No known hash for cluster.klima.uni-bremen.de/~lschuster/w5e5v2.0/flattened/daily/w5e5v2.0_glacier_invariant_flat.nc
2021-09-30 17:46:00: MBsandbox.mbmod_daily_oneflowline: OSError occurred during task process_w5e5_data on RGI60-11.01450: [Errno -101] NetCDF: HDF error: b'/Users/Bolib001/OGGM/download_cache/cluster.klima.uni-bremen.de/~lschuster/w5e5v2.0/flattened/daily/w5e5v2.0_tas_global_daily_flat_glaciers_1979_2019.nc'

Opening the file directly with xarray with xr.open_dataset("/Users/Bolib001/OGGM/download_cache/cluster.klima.uni-bremen.de/~lschuster/w5e5v2.0/flattened/daily/w5e5v2.0_tas_global_daily_flat_glaciers_1979_2019.nc") produces exactly the same error as above.

However, manually downloading the file from the server you sent me and opening with xarray works with no issues. To me, it looks like the issue comes from the automatically downloaded file from process_w5e5_data. FYI, I didn't have any issues when downloading the "classic" monthly CRU data from OGGM.

lilianschuster commented 3 years ago

Hi Jordi,

Thanks for reporting back. I also just tested to delete the files and do process_w5e5_data: for me, it downloads the files normally and I can work with them afterwards. If this works for you, you can just go the manual way: via manually putting the files in the right folder, then when you do process_w5e5_data, oggm won't download the files again and you can then hopefully work with the climate files. But I will ask Fabien next week how to solve this. Probably this has something to do that I do not have any hash for that ... or, as you get an OSError, it might also be a problem from the path description, do you use linux, I have only tested it in linux systems?

fmaussion commented 3 years ago

What can happen is that the files are only partly downloaded if the user breaks the process while running, which could lead to the HDF errors reported here.

These files are probably not yet in the "verified" list (where we check for hashes) - probably we should add them soon

JordiBolibar commented 3 years ago

OK, so I manually placed the files in the default folder and it solved the issue. The automatically downloaded files might be somehow corrupted.

Thanks a lot for the help! Shall I keep the issue open?