Add CAMELS-USA - Githubissues

Daafip commented 6 months ago

The caravan datset has already been added PR https://github.com/eWaterCycle/ewatercycle/pull/407 as discussed in #398. This was relatively simple as we had the NetCDF files availible from the source, we only had to combined them. The downside is it used era5-Land data, the evaporation can be quite far from realistic. See this article on the issue.

As part of my thesis I used the original Camels-USA dataset, which has better forcing. But the forcing is in text files, split per type: forcing/streamflow/characteristics. I ran models for all 671 catchments, in the process already making the conversion to netcdf. I only used a 5 year period, there is data for the period 1980-2010 (some cases 14). I used custom forcing in the HBV mode to achieve this. It would also be nice to include the catchment characteristics. These are currently spread across different files and comparing your results to them requires a bit of pandas effort as shown in this messy notebook, or if you want to view it online use this link. Definetely doable, but effort. Loading observations is shown here

Tl;dr: original camels forocing is better than the caravan. Code exists but still some effort to polish.

One main discussion points. Do we:

update the existing caravan data set? This might be confusing as the forcing is different.
create a new _forcing.camels? Then we have a bit of repeated code but the dataset structure is likley different so kinda needed.

Todo:

[x] Use exisiting code to make NetCDF files for forcing of the whole data set. Using all three sources: Daymet, NLDAS and Maurer

Summary from my thesis

Daymet has the finest resolution at 1x1km, whilst the other two sources have resolutions of 1/8th a degree.

Daymet aims to reproduce the weather conditions in the whole of the USA.

NLDAS is more focussed on the soil moisture stores and energy.

Both Daymet and NLDAS are products by NASA.

The dataset by Maurer et al is a baseline for climate predictions

[x] Also load in characteristics per catchmetns
[x] Combine the forcing and characteristics
[x] merge all 671 catchments
[x] Optionally load streamflow
- could use the USGS link already availible in ewatercycle
- then again the data is availible and might as well if we go through the effort hand a complete product
- Check errors and flags are handled correctly. See: https://github.com/Daafip/ewatercycle-hbv/issues/59
[x] Add to OpenDap

BSchilperoort commented 6 months ago

It's a shame that the forcing data for Caravan has issues. Preferably we would not want to have too many different forcing sources in this main ewatercycle package, as we would have to maintain those. Especially if we have conflicting sources (i.e. Caravan's CAMELS and CAMELS separately).

Of course it is possible to add forcing as plugin (same way as adding a model plugin), but that's also not ideal.

If we already have the observations of streamflow through the USGS system, it would make more sense to have a forcing generator that generates model forcing from daymet data. The downside of that is that it's not global.

Daafip commented 6 months ago

we would have to maintain those.

With this in mind we can just add an extra dimension to the netCDF file for CAMELs basins in the caravan dataset.

Most catchments will just have the era5 land, but camels-USA will then have daymet, Maurer and NLDAS.

Would also be interesting for future data assimilation research to run a model with different forcings.

Peter9192 commented 6 months ago

update the existing caravan data set? This might be confusing as the forcing is different.

I don't have a complete grasp of all differences between the datasets you describe, but in general 4tu has a good system for updating data (creating a new version with a different doi, I believe also with a changelog). Isn't that sufficient to avoid confusion?

Daafip commented 5 months ago

I started working on it here, trying to mimic the structure we used for caravan so we can just merge the two

Daafip commented 5 months ago

I've submitted an update to the data.4tu.nl dataset. Once that is approved an published, I can adjust the CaravanForcing accordingly to support the additional 3 forcings being stored in the camels.nc file.

BSchilperoort commented 2 months ago

To work around some issues with the camels.nc file, I have opened #458 . It would be nicer if this was fixed in the original file, but this was much more time efficient for me to implement.

RolfHut commented 1 month ago

update after talking to Nans Andor: there will be a new release of caravan with "fixed" Epot calculations. This does not mean that we should not also offer the original CamelsUS sources.

eWaterCycle / ewatercycle

Add CAMELS-USA #426