alan-turing-institute / deepsensor

A Python package for tackling diverse environmental prediction tasks with NPs.
https://alan-turing-institute.github.io/deepsensor/
MIT License
69 stars 12 forks source link

`get_earthenv_auxiliary_data` requires `rioxarray` #106

Closed davidwilby closed 3 months ago

davidwilby commented 3 months ago

Description

deepsensor.data.sources.get_earthenv_auxiliary_data used in various sections of the "Getting Started" and "User Guide" sections of the docs is currently broken in DeepSensor v0.3.6 & xarray 2024.2.0 (also tested: 2023.12.0).

get_earthenv_auxiliary_data downloads TIF files from EarthEnv, which are then opened by xarray.open_dataset. As far as I can work out, xr.open_dataset doesn't support opening TIFs via any of xarray's backends.

Calling get_earthenv_auxiliary_data causes xarray to raise a ValueError:

ValueError: did not find a match in any of xarray's currently installed IO backends 
['netcdf4', 'h5netcdf', 'scipy', 'pydap', 'pynio', 'zarr']. Consider explicitly selecting one of 
the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
https://docs.xarray.dev/en/stable/user-guide/io.html

According to the xarray docs linked in the error message (https://docs.xarray.dev/en/stable/user-guide/io.html#rasterio), TIF handling is supported using the rioxarray package by calling it directly, e.g.:

import rioxarray
rds = rioxarray.open_rasterio("RGB.byte.tif")

rather than via open_dataset.

So I'm confused as to how xr.open_dataset in get_earthenv_auxiliary_data has ever worked. I've been through the file history for deepsensor/data/sources.py in this repo and the changelog for xarray and can't find anything to indicate why this behaviour would have changed in either package. Was another backend used to xarray previously? Was TIF reading supported in an earlier version?

I have a branch using rioxarray ready to go if that is the right answer here.

P.S. Sorry for the flurry of issues lately, just going through the docs and logging anything I catch to add to the docs or fix.

Reproduction steps

1. Follow installation instructions.
2. Run the following snippet:

from deepsensor.data.sources import get_earthenv_auxiliary_data
extent = "europe"
cache_dir = ".datacache"
auxiliary_var_IDs = ["elevation", "tpi"]
da = get_earthenv_auxiliary_data(auxiliary_var_IDs, extent, "1KM", cache=True, cache_dir=cache_dir)

Version

0.3.6

Screenshots

No response

OS

Linux

tom-andersson commented 3 months ago

Thanks @davidwilby! Can you try pip installing rioxarray and repeating?

https://github.com/alan-turing-institute/deepsensor/blob/main/requirements/requirements.txt#L4

If that works I think we just need to tell people to pip install -r requirements/requirements.txt if wanting to run the documentation notebooks.

tom-andersson commented 3 months ago

This relates to https://github.com/alan-turing-institute/deepsensor/issues/102 about the docs not being clear enough about set-up for running the doc notebooks.

davidwilby commented 3 months ago

Absolutely, can confirm that having rioxarray installed allows xarray to read the EarthEnv tiffs correctly. Will close this issue and look at adding something brief to the docs about installing the optional dependencies, which I believe might already be mentioned.