Parallel model runs: separate forcing files for each model

Daafip / eWaterCycle-DA

Data assimilations functions to be used in combination with Hydrological modeling system eWaterCycle

https://ewatercycle-da.readthedocs.io/en/latest

Apache License 2.0

0 stars 0 forks source link

Parallel model runs: separate forcing files for each model #35

Open BSchilperoort opened 7 months ago

BSchilperoort commented 7 months ago

In https://github.com/eWaterCycle/leakybucket-bmi/issues/3 we debugged an issue related to resource locking when models are trying to access a shared file. In this specific case it is a model where it's possible to easily change the model code (as you're the creator of the model, @Daafip ), but for a general solution it would be best that models do not share the same resources on disk.

Therefore I propose that the default behavior of ewatercycle-da is to have separate forcing objects for separate models (at least for parallel runs).

Still todo:

[ ] If a single forcing object is passed, duplicate the forcing so that each ensemble member has its own forcing object.
[ ] HBVForcing will have to change (long term todo) to accomodate this

Daafip commented 7 months ago

to not share the same resources on disk

For now i've left this choice up to the user but definitely something to document and make clear. I'd like to keep providing a single forcing object as a option for initial users who say want to use it for just a quick sensitivity analysis.

A work around could be to duplicate the forcing object in case just one is provided?

at least for parallel runs

Currently I've implemented parrallel runs only when running models in docker, for local models it seems to introduce too much overhead, but I still encountered the locking issue.

BSchilperoort commented 7 months ago

For now i've left this choice up to the user but definitely something to document and make clear.

I think having it as an option is OK, while by default using separate forcing files.

A work around could be to duplicate the forcing object in case just one is provided?

I would not call this a workaround, but rather the desired functionality.

Currently I've implemented parrallel runs only when running models in docker

Yes I would keep it that way. Local models should only be for developing/testing if the model works as ewatercycle plugin.

Daafip commented 7 months ago

Should be an easy enough fix, will also ensure works nicely with DA.generate_forcing, which allows for generation of forcinging in parrallel.

I'll discus today how much more dev i'll be doing towards the end of my thesis.

Edit: not much

Daafip commented 6 months ago

but for a general solution it would be best that models do not share the same resources on disk.

So the issue was with using the same forcing object. But in generating the forcing I also had issues where if I had the same source .txt file containing the timeseries, it would also fail:

...
  File ".../ewatercycle_HBV/model.py", line 77, in _make_cfg_file
    self.forcing.from_camels_txt()
  File "..../ewatercycle_HBV/forcing.py", line 204, in from_camels_txt
    ds = xr.Dataset(data_vars=df,
  File ".../xarray/core/dataset.py", line 694, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File ".../xarray/core/dataset.py", line 423, in merge_data_and_coords
    return merge_core(
...

So even source resources can cause issues it seem...

edit: now tried with adjusting this so there is one txt file per forcing object. This works fine. From testing on Windows Subsystem Linux (not ideal), I still have a memory issue causing instability.

Edit edit: Seems to be a memory issue on WSL, when runnnig (slower) on surf, this doesn't seem to be an issue. IO is just hit first as it loads it into memory.

Daafip commented 6 months ago

After more testing, it is a memory issue with WSL. Running bare ubuntu on my machine only used a fraction of the memory WSL consumed (which makes sense).

Daafip commented 6 months ago

it is a memory issue with WSL

For this reason i'll leave it as is at the moment, but something to still look at. Thus moved the todo points up to your initial comment so it shows nicecly on git (sorry for editing your comment bart).

BSchilperoort commented 6 months ago

it is a memory issue with WSL

Glad that you figured it out!

For this reason i'll leave it as is at the moment, but something to still look at. Thus moved the todo points up to your initial comment so it shows nicecly on git (sorry for editing your comment bart).

No worries haha.

HBVForcing will have to change (long term todo) to accomodate this

We can also add a copy method to the base class (ewatercycle's DefaultForcing). Forcings should be self contained in a directory, and should be movable (only relative paths), so you would just have to copy the directory to a new location.

for i in range(n_ensembles):
    target_dir = forcing.copy(ensemble_forcings_dir / "ensemble_{i}")

which internally would call;

class DefaultForcing:
    ...
    def copy(self, target: Path | str) -> Path:
        return Path(shutil.copytree(src=self.directory, dst=target))