CSHS-CWRA / RavenPy

A Python wrapper to setup and run the hydrologic modelling framework Raven
https://ravenpy.readthedocs.io
MIT License
26 stars 5 forks source link

Calibrating with Ostrich needs more flexibility #110

Closed richardarsenault closed 2 years ago

richardarsenault commented 3 years ago

Calibrating with Ostrich needs to be improved.

Right now, we are forced to use NSE, but there should be more options (RMSE, KGE, logNSE, etc.) There is no way to force a warm-up period, so the calibration score includes the warm-up period. There should be a way to add a warm-up period length (in days or in timesteps) The speed is horrendous when running on the server. Running 500 evaluations of GR4JCN for 365 days took almost 20 minutes. Nobody will be able to do a 30-year calibration on 5000 evaluations for example, which is highly recommended to do (or at least 2000-3000). I know there is something in the works that could speed this up, or else we will need to find an alternative.

julemai commented 3 years ago

Other metrics:

Warm-up:

Speed-up:

How does that sound?

richardarsenault commented 3 years ago

Thanks @julemai !

For steps 1 and 2 I am happy to see that there is a way forward!

For step 3, will this speed things up even if I am only using a vector netcdf? it is not gridded, there is only 1 station with data in it so it's a tiny netcdf...

julemai commented 3 years ago

Re step 3: That is EXACTLY what the script I mentioned is doing :)

richardarsenault commented 3 years ago

OK but what I mean is that it is taking forever to calibrate using this tiny vector. Not sure how this will help speedup calibration then?

julemai commented 3 years ago

1 station for a lumped model setup (=1 sub basin and 1 HRU). For a distributed basin with N HRUs it would be N time series in that file.

richardarsenault commented 3 years ago

Yes that is what I have now: 1 station, 1 HRU lumped model. And it takes an eternity to calibrate...

Or maybe it's just that we are so used to having fast Matlab code that this just feels slow :D

julemai commented 3 years ago

The bottleneck is then something else. I run 2000 iterations for 10 years in under 2min using GR4J or HMETS.

richardarsenault commented 3 years ago

OK, so we'll need to figure that out and see how we can make it faster on the server. Thanks for the input!

julemai commented 3 years ago

Maybe you could share your setup with me and I can dig into it and try to find why it is so slow?

julemai commented 3 years ago

But I remember that we had that before already and it was that the script was waiting for a system response and it was just waiting forever because the calibrations with huge budgets were not actually running...

richardarsenault commented 3 years ago

Hmmm the setup is on the pavics.ouranos.ca server, so I am not sure how I can extract and share the whole setup. I could give you the notebook that you could run in your own account, though, if you want to replicate?

Also, yes we had talked about it but this time it's actually completing, it's just so, so slow...

julemai commented 3 years ago

Hm. Maybe @cjauvin or @huard could help with extraction? If I don't see the actual setup it is really hard to tell what might go wrong.

huard commented 3 years ago

@richardarsenault

Please create a benchmark folder (parallel to tests) and put your example in there.

richardarsenault commented 3 years ago

I've created a branch called "benchmark_notebooks", where there is now a benchmark folder and the offending notebook. Note that I reduced the number of trials to 50 and days to 200 to give a semblance of reasonable results and not take too much time. Try setting it at 365 days and 500 evals...

cjauvin commented 3 years ago

I have just ran this notebook locally, and apart from the fact that it does not use the latest version of RavenPy (we use model.config.rvh instead of model.rvh now) it executes in a fraction of a second on my desktop. There's probably something I didn't do correctly to reproduce the problem though, I suspect.

cjauvin commented 3 years ago

By "this notebook" I mean your newly created benchmark notebook: https://github.com/CSHS-CWRA/RavenPy/blob/benchmark_notebooks/benchmark/06_Raven_calibration.ipynb

richardarsenault commented 3 years ago

Yeah all codes run much slower on the server than locally... even doing a few imports takes up to 10 seconds or more after starting a kernel.

huard commented 2 years ago

Trying to synthesize the TODOs:

huard commented 2 years ago

https://github.com/Ouranosinc/pavics-sdi/issues/249