Rework runtime model data storage

FormingWorlds / PROTEUS

Coupled atmosphere-interior framework to simulate the temporal evolution of rocky planets.

https://proteus-code.readthedocs.io

Apache License 2.0

10 stars 1 forks source link

Rework runtime model data storage #95

Open nichollsh opened 1 month ago

nichollsh commented 1 month ago

The way PROTEUS currently stores information is messy, being split between COUPLER_options and runtime_helpfile. This poses a significant risk of introducing unexpected behaviours, and is also decreasing performance. It also makes adding additional modules difficult. This is also the main difficulty in allowing the simulations to be resumed (issue #90).

We should consider storing model variables in a dataclass. This would allow the information to be passed around more easily and transparently. These data can easily be converted to a dictionary and then written to a CSV file.

Dan is already doing something similar with atmodeller. JANUS and AGNI also store model information in a struct-like entity. It would make sense if PROTEUS did the same.

@lsoucasse and @timlichtenberg, what are your thoughts on this?

This is connected to issues #74, #94, #82, #76, #87, #70.

nichollsh commented 1 month ago

This is quite a drastic change to the model. As an easier alternative, we could rework the current storage structures with clear headers and defined purposes.

For example: maybe COUPLER_options only has fixed values set by the cfg. While another variable contains things calculated at runtime. We could use xarray for this, rather than pandas, since it allows us to easily include units and write to netcdf files (this would automatically resolve #70).

lsoucasse commented 1 month ago

I agree we need to revisit the storage of input information. What is important to me is to discriminate the parameters/data that are fixed for the whole simulation to those which evolve with time (For now the constant parameters are copied at each line of the COUPLER_options file).

timlichtenberg commented 1 month ago

I fully agree with both of you, how the data is currently stored is illogical and leads to misunderstanding. However, I do believe it is important to have at least some global parameters easily accessible. Everyone understands how to open a .txt or .csv file. but a .json file for example is difficult to parse for a human. So perhaps we can find an option that both makes the data storage more homogeneous but at the same preserve some accessibility?

nichollsh commented 1 month ago

Maybe a list of desired characteristics would be useful, and then we can decide on the best method for approaching this.

Separation of input parameters and output data
Minimal file reads/writes
Clearly named variables
Can be written to a CSV file as a table (rows indicating model time)
The units of each variable are available (e.g. mixing ratios specified as VMR not MMR)

Any other suggestions? I do think these points above can all be achieved by passing around xarray variables, which isn't too different to the current method.