eWaterCycle / ewatercycle

Python package for running hydrological models
https://ewatercycle.readthedocs.io/en/latest/
Apache License 2.0
33 stars 5 forks source link

Add Data Assimilation #395

Open Daafip opened 3 months ago

Daafip commented 3 months ago

As part of my Master Thesis with @RolfHut I'll be working on a framework to run data assimilation on eWaterCycle. My final usecase is the use Particle Filters to assimilate streamflow in the HBV model in the CAMELS catchments. But the goal is to have it working such that any type of Data Assimilation can be added.

There has been some experimenting in the past, but more to show that is it is possible.

Looking at the requirements of the different Data Assimilation methods currently used (PF, EnKF, ES, ES-MDA, IES ), I designed the following work flow: flowchart_comparison_1_white Which I've started implementing here (still experimental currently, but running).

My main question is where should be integrate this into eWaterCycle?

From a disucssion today with @RolfHut & @BSchilperoort we thought of adding it as an entry point to ewatercycle.util? This way users see it as an extra untility function.

Curious to hear your thoughts on this!

sverhoeven commented 3 months ago

I like the idea of having data assimilation functionality inside the ewatercycle package. Having it in its own module like ewatercycle.assimilation would make it easier to find, use, document and test.

RolfHut commented 3 months ago

can we use the plugin way of working to have ewatercycle.utils.DA ?

BSchilperoort commented 3 months ago

My preference would be to make data assimilation a plugin. This way it can be its own self-contained package, which can make development easier and more streamlined.

The downside is that it's another package that needs to be installed, and that the documentation for ewatercycle becomes more fractured.

Daafip commented 3 months ago

The underlying discussion is then do we want a Data Assimilation package that runs eWaterCycle hydrological models (plugin ewatercycle.utils.DA) or do we want functionality within which allows data assimilation in eWaterCycle (ewatercycle.assimilation)? Both have benefits and drawbacks in my opinion.

RolfHut commented 3 months ago

(Maybe git is not the right place for this, but...)

I think that describing (in order):

will be an important part of your thesis (design choices chapter or something like that), so if you write that out, we can make a final decision on what it will be. (I know where my vote is going to be, but am more than open to be convinced that I'm wrong)

Peter9192 commented 3 months ago

Nice discussion.

I'm not so keen on organising plugins under utils. Perhaps create ewatercycle.plugins if you want to go that route? Much like

If we decide to integrate it with ewatercycle itself, we could make it an optional dependency group (and perhaps we could do the same for esmvalcore @BSchilperoort ). Then you would get pip install ewatercycle[da,forcing], much like xarray

RolfHut commented 3 months ago

This seems like a V2 vs V3 thing: if we want to go the xarray way, we need to overhaul the entire structure, so V3. But @Daafip needs to work with DA before that, so for now I suggest a minor release in V2 (2.1?) where DA is added to the utils.

Peter9192 commented 3 months ago

That's fine with me. It might also make it easier to see which option is most appropriate. Note that adding DA as a plugin does not require a V3, but adding it to 2.1 and then changing it for a next release would. So we might want to warn that the DA part is unstable for now, if we include it in a release.

Daafip commented 3 months ago

But @Daafip needs to work with DA before that

From my side a standalone python package is okay for now, thats how i'm currently developing and testing it. Its better to nicely integrate it for an end user and keep it all together, but for the purpose of my thesis there is no rush.

Daafip commented 3 weeks ago

From my side a standalone python package is okay

I've developed it as the ewatercycle-da package which is availible through pip. Currently, I've tested it extensively for particle filtering data assimilation method and with the HBV model.

Futher development would include a second or third data assmilation method and testing with more model to ensure these are all compatible. This is however beyond my reach for the time being. I've also added some thoughts to consider in future development in the issues section of the ewatercycle-da repo.

The added benefit it would add when integrating ewatercycle-da into the main eWaterCycle package, aside from adding data assimilation, is that I've implemented parallelisation by default. This would make running a variety of models at once or calibration of models much easier & faster for the user.