CWorthy-ocean / C-Star

C-Star is a python package for setting up and running ocean model simulations, with a particular focus on marine carbon dioxide removal (mCDR) applications.
https://c-star.readthedocs.io
10 stars 4 forks source link

`Component`s should be independent #121

Open dafyddstephenson opened 1 week ago

dafyddstephenson commented 1 week ago

Conceptually, a Component should be able to run in standalone, without being defined as part of a Case, and without being coupled to any other Component.

The Component class shares many of its methods with Case: setup, build, pre_run, run, and post_run. When these methods are called on Case, Case loops over each Component and calls the methods on them in turn.

Component should be able to run all of these methods without existing in the context of a Case, but can't do this sensibly, as certain fundamental aspects of a simulation are held in the Case class above the reach of the Component instance, such as the Case.start_date and Case.end_date attributes, which are used to pass an n_time_steps parameter to Component.run(), or the Case.caseroot attribute, which is used to pass an output_dir parameter to Component.run().

I propose that we:

1. restructure the caseroot directory.

The caseroot currently resembles, e.g.:

.
└── caseroot
    ├── output
    │   └── ROMS
    ├── additional_source_code
    │   └── ROMS
    │       ├── bgc.opt
    │       ├── cppdefs.opt
    │       └── my_custom_module.F
    ├── input_datasets
    │   └── ROMS
    │       ├── input_dataset1.nc
    │       └── input_dataset2.nc
    └── namelists
        └── ROMS
            └── roms.in

This is, in my opinion, more navigable to a user exploring it on the filesystem, but jumbles the hierarchical order of classes in C-Star. It would be more compatible with C-Star's design to structure it as follows:

.
└── caseroot
    └── ROMS
        ├─ output
        ├── additional_source_code
        │   ├── bgc.opt
        │   ├── cppdefs.opt
        │   └── my_custom_module.F
        ├── input_datasets
        │   ├── input_dataset1.nc
        │   └── input_dataset2.nc
        └── namelists
            └── roms.in

The advantage of this structure is that it reflects the design of C-Star. Perhaps more importantly, a component_root attribute naturally follows from this structure, allowing a user to run ROMS outside of the context of a Case with other Components, without even defining a Case or caseroot. Following #115 , this is likely to be the only context in which C-Star is run for the foreseeable future. If the user is running the Component as part of a Case, the Component.component_root (or whatever we call it) would just be defined as a subdirectory of Case.caseroot.

2. Add start_date and end_date as attributes on a Component.

In the event that the Component contributes to a Case, the Case could select the earliest/latest start/end dates from all the Components, or something like that. Suggestions 1 & 2 together would allow independence of the Component class.

3. Consider eliminating the Case class altogether and renaming Component

Following #115 , there is (well, will be) only a single Component supported by C-Star. Following the above suggestions, it will be able to run as a standalone object. It is certainly worthwhile to anticipate that this will change, but I don't see it as making sense that we accommodate the change without thoroughly considering how it will look. Building an umbrella class that has no formal coupling infrastructure and just calls all its constituent components in a loop does not reflect expected usage of a multi-component system.

Without Case , however , the name Component is a bit redundant (component of what?) and should thus be changed. To what, I'm unsure. Maybe we could ask the public as an outreach effort.

@matt-long @NoraLoose @TomNicholas it would be great to hear your thoughts on this

matt-long commented 6 days ago

Thanks for the conversation today, @dafyddstephenson. I support this conceptual change.

As discussed, the term component can be thought of as a "component of the C-Star workflow." I am not opposed to the term component, but if we are to keep it, we need to be careful to disambiguate the use of the term in coupled model, where "component" refers to a "component model."

Alternative terminology:

I kind of like subsystem — the flavor of "system" rings true with our ROMS-MARBL example and even, perhaps, a coupled ESM.

We were focused on geophysical codes today — but more broadly our workflows will require postprocessing analysis.

Do we consider the analysis sequences as another flavor of a subsystem?

From a workflow management perspective, what is the most generic description of a subsystem?

The methods:

The various flavors of subsystem may have differences in their APIs and additional "low level" methods — but the concept of a blueprint could flow through all. Could it?