Open Daafip opened 6 months ago
It's a shame that the forcing data for Caravan has issues. Preferably we would not want to have too many different forcing sources in this main ewatercycle package, as we would have to maintain those. Especially if we have conflicting sources (i.e. Caravan's CAMELS and CAMELS separately).
Of course it is possible to add forcing as plugin (same way as adding a model plugin), but that's also not ideal.
If we already have the observations of streamflow through the USGS system, it would make more sense to have a forcing generator that generates model forcing from daymet data. The downside of that is that it's not global.
we would have to maintain those.
With this in mind we can just add an extra dimension to the netCDF file for CAMELs basins in the caravan dataset.
Most catchments will just have the era5 land, but camels-USA will then have daymet, Maurer and NLDAS.
Would also be interesting for future data assimilation research to run a model with different forcings.
update the existing caravan data set? This might be confusing as the forcing is different.
I don't have a complete grasp of all differences between the datasets you describe, but in general 4tu has a good system for updating data (creating a new version with a different doi, I believe also with a changelog). Isn't that sufficient to avoid confusion?
I started working on it here, trying to mimic the structure we used for caravan so we can just merge the two
I've submitted an update to the data.4tu.nl dataset. Once that is approved an published, I can adjust the CaravanForcing
accordingly to support the additional 3 forcings being stored in the camels.nc
file.
To work around some issues with the camels.nc file, I have opened #458 . It would be nicer if this was fixed in the original file, but this was much more time efficient for me to implement.
update after talking to Nans Andor: there will be a new release of caravan with "fixed" Epot calculations. This does not mean that we should not also offer the original CamelsUS sources.
The caravan datset has already been added PR https://github.com/eWaterCycle/ewatercycle/pull/407 as discussed in #398. This was relatively simple as we had the NetCDF files availible from the source, we only had to combined them. The downside is it used era5-Land data, the evaporation can be quite far from realistic. See this article on the issue.
As part of my thesis I used the original Camels-USA dataset, which has better forcing. But the forcing is in text files, split per type: forcing/streamflow/characteristics. I ran models for all 671 catchments, in the process already making the conversion to netcdf. I only used a 5 year period, there is data for the period 1980-2010 (some cases 14). I used custom forcing in the HBV mode to achieve this. It would also be nice to include the catchment characteristics. These are currently spread across different files and comparing your results to them requires a bit of pandas effort as shown in this messy notebook, or if you want to view it online use this link. Definetely doable, but effort. Loading observations is shown here
Tl;dr: original camels forocing is better than the caravan. Code exists but still some effort to polish.
One main discussion points. Do we:
_forcing.camels
? Then we have a bit of repeated code but the dataset structure is likley different so kinda needed.Todo: