Deltares / Ribasim

Water resources modeling
https://deltares.github.io/Ribasim/
MIT License
36 stars 5 forks source link

Case management #47

Open visr opened 1 year ago

visr commented 1 year ago

We need to be able to easily handle different variations of model runs. Ribasim 7 uses cases, scenarios and measures / measurement actions. Today I was discussing how to approach this for Ribasim.jl with @Hofer-Julian, and we came up with an approach for how to handle cases. Scenarios and measures functionality would be part of it, but let's just call it cases.

Currently a single model run can be started like this:

ribasim config.toml

This can be seen as a reference run / base case. If we want to run other cases based on this, currently the only option would be to have many copies of config.toml with slightly different contents. This can be scripted, but is not convenient for users to handle directly if the number of cases increases.

To support cases, we could have a second TOML file, which we can now call cases.toml. The config.toml contains the complete configuration for the reference run, and cases.toml has a mapping from the case ID (a TOML key) to the subset of configuration it wants to override. Per case this subset could be different.

To keep the config.toml as the full configuration file, such that we can keep the ribasim config.toml API, we need to add an option like cases = "./cases.toml" to config.toml. If this is not present, only the reference case is run. If it is specified, ribasim config.toml will by default run the reference case as well as all the cases in cases.toml. The cases.toml file should also make it easy to turn cases off and on. Commenting out cases would work, but probably we should also support specifying them as a TOML array of case IDs. For this we start cases.toml with a meta section.

[meta]
include = [
  "add-reservoir",
  "test-irrigation",
]

If meta.include is an empty array ([]), no cases are run, and if it is not set, all cases are run.

include = [
  "add-reservoir",
  "test-irrigation",
]

An example cases.toml is given below:

[local-area]
ids = [14908, 14909, 14910, 14784]

[add-reservoir]
endtime = 2019-01-03
node = "node_reservoir.arrow"

[test-irrigation]
waterbalance = "waterbalance/test-irrigation.arrow"
run_modflow = true
[test-irrigation.modflow]
simulation = "../data/test-irrigation/mfsim.nam"

This defines 3 cases, local-area, add-reservoir, test-irrigation. TOML also allows specifying the local-area case on one line, like: local-area.ids = [14908, 14909, 14910, 14784], though it's probably best to avoid mixing these styles.

To avoid having to edit all the output paths per case, i.e. for case-A to "cases/case-A/waterbalance.arrow" or similar, we could default to writing output in "cases/case-A" by default. This would only apply to output, not input. We can consider grouping the output paths in config.toml into an output section together.

evetion commented 1 year ago

I like these REPs (Ribasim Enhancement Proposals). ❤️

cases, scenarios and measures / measurement actions

Might be good to detail what those mean, and how that would map to Ribasim actions here.

such that we can keep the ribasim config.toml API

We don't need to be backwards compatible at this moment in time. We could also introduce a ribasim run subcommand or a new ribasim-scenario command. That makes the config and explanation easier (will this command run 1 or many simulations?), as one command just takes one toml schema, which probably will help with the cases/scenarios/measures definition.

Besides, I think it's odd to point from config.toml to cases.toml, which itself implicitely points back to config.toml. Maybe it's more logical to have a case.toml which has a key

[meta]  # or even on a per case basis?
overrides = "config.toml"
key = "value" # key to override with value in config.toml

This would also enable cases to depend on multiple different config.tomls.

Lastly, to help with the output per case, wouldn't it be better to always require a "name" for a config.toml, that will automatically be the name of the output folder? A case would just override "name", which would fix the folder output. That would also deprecate setting "waterbalance" (it would always be called waterbalance.arrow)?

Hofer-Julian commented 1 year ago

We could also introduce a ribasim run subcommand or a new ribasim-scenario command.

Sounds good. For the new command, we would have to check if Package Compiler is smart enough to produce two binaries with shared dependencies. If not, a subcommand would be the better option.

This would also enable cases to depend on multiple different config.tomls.

I wonder if that is a real use case to be honest

Lastly, to help with the output per case, wouldn't it be better to always require a "name" for a config.toml, that will automatically be the name of the output folder? A case would just override "name", which would fix the folder output. That would also deprecate setting "waterbalance" (it would always be called waterbalance.arrow)?

Sounds good to me

visr commented 1 year ago

Might be good to detail what those mean, and how that would map to Ribasim actions here.

Cases, scenarios and measures / measurement actions all map to the cases as defined here. Scenarios and measures were separated in term of natural and human influences, but this distinction is not always so clear, so I think it is better to flatten them to a single concept.

We don't need to be backwards compatible at this moment in time.

I agree. The main consideration was to retain a single TOML file that contains or holds references a complete model definition (or multiple ones with cases).

it's odd to point from config.toml to cases.toml, which itself implicitely points back to config.toml

Yeah I thought about this, although it is consistent in the sense that config.toml references everything, and nothing references config.toml. For all files referenced in config.toml (the Arrow tables and cases.toml), it is true that they are closely tied to the model (config.toml), and cannot easily apply to other models. Normally speaking they would be kept together in the same directory.

For the rest I agree with @Hofer-Julian that it's good to look into subcommands, and that name could be good to add. Although I think that I'd like to separate the concept of name and input and output folders by default. See also https://github.com/Deltares/Wflow.jl/pull/175. A name in config.toml can act more like a identifier of your model in a set of models, probably accompanied by a version.