eWaterCycle / ewatercycle

Python package for running hydrological models
https://ewatercycle.readthedocs.io/en/latest/
Apache License 2.0
33 stars 5 forks source link

Make it possible to run models in parallel #78

Open Peter9192 opened 3 years ago

Peter9192 commented 3 years ago

For the use case where we run two (or more) experiments with minor differences in one notebook, it would be really nice if they could execute in parallel. E.g.

discharge1
while reference_model.time < reference_model.end_time:
    reference_model.update()
    discharge = reference_model.get_value("discharge")
    discharge1.append(discharge)

discharge2
while experiment_model.time < experiment_model.end_time:
    experiment_model.update()
    experiment_model.set_value("soil_moisture", .........)
    discharge = experiment_model.get_value("discharge")
    discharge2.append(discharge)

There are different ways to accomplish this; it would be nice if we could offer an easy way for the user using the ewatercycle interface, but this example shows it should at least be flexible enough to have custom statements in the second loop.

Peter9192 commented 3 years ago

Another use case: if I run multiple model instances on a 24-core node on Cartesius like so:

for model in models:
    model.initizalize()
    while model.time<model.end_time
    model.update()

What is the most straightforward way to execute this job in parallel, e.g. 1 model instance per core?

Could we simply use multiprocessing? Or is there a better way? And would it be possible to add an example of this use case to the documentation?

sverhoeven commented 3 years ago

Multiprocessing will still be within same machine. If we implement something parallel I would like it to be also distributed with something like https://docs.dask.org/, https://ray.io or https://docs.celeryproject.org/

BSchilperoort commented 6 months ago

Running models in parallel works, however it only works when no data has to be serialized and transferred to other processes (i.e. no dask or mutliprocessing, but threading works). Edit: this might have been just for the Wflow.jl case as it dependent on connecting to Julia.

However, as the models inside docker/apptainer that isn't a big hurdle.

@Daafip would you be able to add an example if/when you have this working?

Daafip commented 6 months ago

Looking at Data Assmilation (DA), running models in parrallel (faster!) would be great. Currently working on, first implementing a set of classes to run DA . Example here.

I have a crude example using tqdm working in a notebook. Here many models compute their output in parrallel. From early testing I didn't find much benefit from that as of yet , but I put in no real effort just yet.

I'm following the 'Parallel Programming with Python' NLeSc course next week & will after that take a good look.

BSchilperoort commented 6 months ago

From early testing I didn't find much benefit from that as of yet

I would expect that in your case, as the runtime for the HBV model is very short. It should be different when models take longer to compute their .update(), for example, distributed models. If you would like to test if the parallelization actually works you could add a sleep(5) statement inside the model code.

Daafip commented 5 months ago

would you be able to add an example if/when you have this working?

I'm currently looking into this. My proposed structure would look something like this where the greyed out part is only when running data assimilation whilst the rest can be used when just running an ensemble of models. method3_no_DA

Daafip commented 5 months ago

If you would like to test if the parallelization actually works you could add a sleep(5) statement inside the model code.

Added a test model instead, won't add it to PyPi as its more for development purposes. Maybe slightly overkill but allows testing inside a docker container too, which adds some overhead. Can be found here

Got a 10x theoretical speed up now using dask,delayed, example here.

Next to try and speed up the rest of the data assimilation steps as getting and setting the whole state vector also could be optimized.

BSchilperoort commented 5 months ago

Thanks for sharing the progress!

Got a 10x theoretical speed up now using dask,delayed

Did you use Dask's default scheduler? There is also the "distributed" dask client, which they generally recommend nowadays https://docs.dask.org/en/stable/scheduler-overview.html The exact speed up with dask will depend on the Dask scheduler & configuration, but also the system you're on. If you have 10 threads on your CPU, the max speed up you can get is 10x.

Daafip commented 5 months ago

Did you use Dask's default scheduler?

Yeah for now.

Thanks, will look into the customisation: necessary for larger applications. Was more a proof of concept. Fair point on the 10 threads, I think it defaulted to 12 but I did short tests so the amount of overhead is still significant.

_Edit: was a quick fix indeed limited by num_workers: changed in example & implementation_

For DA (& other applications) getting and setting states also induces quite some run time so will look into parralellising this too.

Daafip commented 5 months ago

Now have it working with using the ewatercycle_DA package. Example in this readme. Currently only works with HBV & lorenz model but will change this soon

Daafip commented 5 months ago

Now works on all models installed for the user. Example of working with Marrmot shown here