Closed Daafip closed 5 months ago
Nice that the dataset is available on Zenodo now. When I chatted with Frederik (the PI) last EGU, you basically had to go though Google Earth Engine to get the data.
It is a bit frustrating that everything is in a single zip file though. It would have been nice if the parts were more split up.
For integration with eWaterCycle I see one main hurdle:
dataset of meteorological forcing data, catchment attributes, and discharge data
This combines eWaterCycle's separate forcing, parameter sets and observations in one block. So if you'd want to integrate this it would have to be split up into a CaravanForcing
, a caravan ParameterSet
definition, and a caravan.py
module in observations.
Of course this is still completely viable.
We do not have a ZenodoDownloader implemented, but there is a placeholder here.
As far as I understand we are allowed to redistribute it?
Yes. But we don't need to if we would implement a downloader. The users themselves would be downloading it.
The naming scheme is also different (to be expected), what worked for me to locally use camels files.
date
is used as dimension rather than time
. ds = ds.rename_dims({'date': 'time'})
ds = ds.rename({'date': 'time'})
pr
,pev
etc. all have different names. To get HBV to run:RENAME_CAMELS = {'total_precipitation_sum':'pr',
'potential_evaporation_sum':'pev',
'streamflow':'Q'}
ds = ds.rename(RENAME_CAMELS)
I had a look at the caravan data. The catchments have a file per catchment, and the attributes are in separate files. Each netCDF file does not use the variable attributes, the units are instead defined in the general attrs... :face_with_spiral_eyes:
It would be possible to reorganize this, move the variable attributes to the proper locations, and merge the separate basin files in a single netCDF (per camel). On a new "basin" dimension, you can then add the basin's ID as coordinates, as well as the metadata as additional variables.
The netCDF files are also not optimally compressed. I was able to compress it to 38% of the original netCDF size. So all Caravan netCDF files would only be about 6.3 GB.
I think before adding the caravan dataset to eWaterCycle, we'd have to go through the following steps:
data.4tu.nl
, so that they can be accessed using OPeNDAP.To then get the data for a CAMELS basin, all that's needed is:
def get_camels(dataset: str, basin_id: str):
ds = xr.open_dataset(f"https://data.4tu.nl/.../{dataset}")
return ds.sel(basin=basin_id)
It was faster to just write the conversion notebook than to discuss this/think of when to do this.
Here's the notebook: https://gist.github.com/BSchilperoort/256751fe2ea060c50b103f72026590a2
Now we'd just need to upload it to https://data.4tu.nl (along with the shapefiles as well perhaps...). However the free limit for non-associated dutch researchers is 5 GB/year, and the data will be just over 6 GB. Once I have the TU Delft guest account I can upload it.
Now we'd just need to upload it to https://data.4tu.nl (along with the shapefiles as well perhaps...). However the free limit for non-associated dutch researchers is 5 GB/year,
I got around this as a student: https://data.4tu.nl/collections/bf0eaf7c-f2fa-46f6-b8cd-77ad939dd350
Waiting now for my submission of the data to be approved. Once that goes through I'll:
By the way, there's also a GRCD extension to Caravan now. https://zenodo.org/records/8425587
Now on the OpenDAP server: Kratzert, Frederik; Schilperoort, Bart; Haasnoot, David ; Hut, R.W. (Rolf) (2024): Caravan - A global community dataset for large-sample hydrology. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/ca13056c-c347-4a27-b320-930c2a4dd207
To then get the data for a CAMELS basin, all that's needed is:
def get_camels(dataset: str, basin_id: str): ds = xr.open_dataset(f"https://data.4tu.nl/.../{dataset}") return ds.sel(basin=basin_id)
Works for me using:
def get_camels(dataset: str, basin_id: str):
ds = xr.open_dataset(f"https://opendap.4tu.nl/thredds/dodsC/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/1/{dataset}.nc")
return ds.sel(basin_id=basin_id.encode())
Now availible in the main branch, one downside (of the whole dataset) is it can be dificult to find which basin_id you actually want as a user. Now would include looking through the dataset and or downloading the shapefile manually. Could easily incoperate somthing like a webmap/follium map. See this example I made previously for KNMI weather stations: this.
Can be found here: https://github.com/Daafip/caravan-map
Can be found here: https://github.com/Daafip/caravan-map
Nice! Can't you host it on github pages?
Can't you host it on github pages?
Hadnt thought of that
Nice! Can't you host it on github pages?
Thanks for adding this, David.
Peter and Stefan liked the map view a lot, and see potential in it to use it to make a simpler "click on the basin, get an ewatercycle notebook with forcing generation + model(s)" interface (something eWaterCycle used to have), but that's for some other time.
'Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge data for catchments around the world'. With a notable publication .
Would be nice to easily access this as a standardised dataset in eWaterCycle. Currently you need to download a 12gb zip file and split the files yourself. Is there a way to nicely integrate this?
'The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.' As far as I understand we are allowed to redistribute it?