astro-turing / Integrating-diagenetic-equations-using-Python

Reactive-transport model simulating formation of limestone-marl alternations
Apache License 2.0
0 stars 1 forks source link

Provide for runs covering a volume of parameter space. #20

Open HannoSpreeuw opened 1 year ago

HannoSpreeuw commented 1 year ago

This request from @EmiliaJarochowska needs some discussion because there are a number of aspects to decide on. It is a big topic and quite some work to implement this properly.

One could create a special version of this parameter file, not covering single parameter values but ranges of values for some parameters, with some bin size for the sampling. Then every combination of parameters could be mapped onto a single hdf5 file. Also, a special setup is needed making sure that the runs for all the combinations of parameters are executed consecutively in an automated way. Or in parallel, on multiple nodes of a cluster. That would be faster, but would require an mpi version of this codebase.

However, one could end up with a ton of hdf5 files, which requires further data reduction and analysis in order to draw any scientific conclusion.

Ideally, instead of a ton of hdf5 files, one might prefer a single multidimensional graph depicting the effect of the parameter variations on the depth profiles of the five fields, i.e. on aragonite and calcite compositions, on the pore water concentrations of the two ions and on the porosity. Exactly how this graph should be compiled needs some thought.

But perhaps a ton of hdf5 files as output is good enough as a first step.

EmiliaJarochowska commented 1 year ago

I am aware this is a lot of work so I'd love to discuss it first, including @jhidding, because some aspects of his original config file I don't grasp (and cannot run the fortran code that uses it, so I cannot try it out to understand).

The motivation for this request is as follows:

  1. Testing the code when we try to reproduce the oscillations - this does not require scanning a range of parameters, only storing the parameters (ideally together with the output in a hdf5 file). We often wanted to go back to a previous run to recall what a given modification changed. Not having the parameters (and output) stored in a systematic way quickly led to confusion.
  2. Scanning a set of parameters will only really be needed if we reproduce oscillations. So we can decide against that for now and focus on the model itself and having reproducible runs with a record of parameters. The only consideration is whether it is worth rewriting it for a range of parameters if we agree on 1.

Or in parallel, on multiple nodes of a cluster. That would be faster, but would require an mpi version of this codebase.

It is not slow as of now. Maybe will become slow for certain values of parameters, but, again, we'll only run ranges of parameters for long model times if we get oscillations. So perhaps not priority now.

Ideally, instead of a ton of hdf5 files, one might prefer a single multidimensional graph depicting the effect of the parameter variations on the depth profiles of the five fields, i.e. on aragonite and calcite compositions, on the pore water concentrations of the two ions and on the porosity. Exactly how this graph should be compiled needs some thought.

Why do we need to end up with a ton of hdf5 files? I thought we could use this format to store outputs from multiple runs in one file, possibly even with the input parameters, in a structured way.

Alternatively we can use our RDM team to set up metadata for the ton of files - they really enjoy that sort of work but it's a bit experimental.

We have thought about such a graph already! Specifically about a pipeline for detecting oscillations. This was a bit optimistic.

I think the graph is something @NiklasHohmann and I can work on, as this requires much less skill than you have. And for us it's good learning. Of course if you agree.

HannoSpreeuw commented 1 year ago
  1. We often wanted to go back to a previous run to recall what a given modification changed. Not having the parameters (and output) stored in a systematic way quickly led to confusion.

I guess that will be covered by #19

HannoSpreeuw commented 1 year ago

It is not slow as of now. Maybe will become slow for certain values of parameters, but, again, we'll only run ranges of parameters for long model times if we get oscillations. So perhaps not priority now

Okay, thanks for clarifying that. Perhaps I should add a "Priority low" label to this issue.

HannoSpreeuw commented 1 year ago

Why do we need to end up with a ton of hdf5 files? I thought we could use this format to store outputs from multiple runs in one file, possibly even with the input parameters, in a structured way.

O yeah, you're right. That is indeed possible.

HannoSpreeuw commented 1 year ago

I think the graph is something @NiklasHohmann and I can work on, as this requires much less skill than you have. And for us it's good learning. Of course if you agree.

Absolutely. Please go ahead. I mean even with the output from all the runs stored in a single hdf5 file, data reduction, i.e. some form of combining the data from all the runs, is needed to enable an analysis.

EmiliaJarochowska commented 1 year ago

Absolutely. Please go ahead. I mean even with the output from all the runs stored in a single hdf5 file, data reduction, i.e. some form of combining the data from all the runs, is needed to enable an analysis.

I guess one can append to one hdf5 file even if these are multiple individual runs from different config files. We'll wait for #19 to work on data reduction - it will probably naturally fall after I am back from holidays.