Closed NoraLoose closed 2 months ago
Option B is how I imagined it but reading this I think I prefer C. I think my preference is purely biased towards my experience with MATLAB roms-tools though, I'm not sure I can channel the perspective of a first-time user.
If we were to go B though I wonder if it makes sense for a single .ds
to return a list of datasets rather than having a separate.ds
and .ds_bgc
? Or perhaps some combination of B and C where the final example you present here (with two X_source
parameters) could return a list, but one X_source
parameter defaults to returning one dataset? If we assume widespread adoption in the ROMS community, I imagine most users will only be interested in physics modelling.
If we were to go B though I wonder if it makes sense for a single .ds to return a list of datasets rather than having a separate.ds and .ds_bgc?
If you're going to have .ds
return anything other than xr.Dataset
, you may as well have it return a DataTree
.
I think I prefer C. Conceptually there are two different boundary forcing datasets needed to force the simulation, so you have to use the BoundaryForcing
class twice. That seems reasonable to me. The only weird part about it is physics_source
vs bgc_source
, as opposed to e.g. source=
and type={'physics', 'bgc'}
.
I'm also leaning towards C I think. One question I had reading through this is what happens in option A and B if the user wants to set up a physics-only run?
Thanks for your comments! I think I also favor Option C!
The only weird part about it is physics_source vs bgc_source, as opposed to e.g. source= and type={'physics', 'bgc'}.
True, let's do
boundary_forcing = BoundaryForcing(
grid=grid,
start_time=start_time,
end_time=end_time,
source={"type": "physics", "name": "GLORYS", "path": path},
#source={"type": "bgc", "name": "CESM_REGRIDDED", "path": path, "climatology": True} # alternatively
)
and
surface_forcing = SurfaceForcing(
grid=grid,
start_time=start_time,
end_time=end_time,
source={"type": "physics", "name": "ERA5", "path": path},
#source={"type": "bgc", "name": "CESM_REGRIDDED", "path": path} # alternatively
)
@TomNicholas What naming convention do you recommend for the initial conditions, though? Currently, we have
surface_forcing = SurfaceForcing(
grid=grid,
start_time=start_time,
end_time=end_time,
physics_source={"name": "GLORYS", "path": path},
bgc_source={"name": "CESM_REGRIDDED", "path": path, "climatology": True} # optional
)
Keep the parameter names physics_source
and bgc_source
? Note that for the initial conditions we need to process these two together if BGC is to be included.
After yesterday's discussion we landed on the conclusion that we should move away from
datatree
for the.ds
property and instead use anxr.Dataset
that includes both physical and BGC variables.However, there is the complication that both the physical and BGC boundary variables need a dimension and coordinate
bry_time
(since this is hardcoded in ROMS), but the physical vs. BGCbry_time
have different lengths and hold different frequencies. Given this complication, I can see three options for how to move forward.Option A
The object
has a single attribute
.ds
that holds both physical and BGC variables. To avoid a name clash, the physical variables have the dimension and coordinatebry_time
, whereas the BGC variables have the dimension and coordinatebry_time_bgc
. The.save()
method will then split the physical and BGC variables into two separatexarray.Datasets
and renamebry_time_bgc
tobry_time
before saving to NetCDF.Option B
The object
has two attributes,
.ds
and.ds_bgc
. Each of these holds anxarray.Dataset
with the physical and BGC variables, respectively. Both of these datasets can have thebry_time
dimension and coordinate, and the.save()
method will just save these two datasets to two separate NetCDF files.Note that with this option, if the user skips the
physics_source
, i.e., doesthere won't be an attribute
.ds
(as for the other classes likeGrid
,TidalForcing
) but only an attribute.ds_bgc
, which could potentially lead to confusion.Option C
We force the user to process physical and BGC variables separately, i.e., via
and then both
physical_boundary_forcing
andbgc_boundary_forcing
have a single attribute.ds
that holds the respective variables with dimensionbry_time
.If the user tries to process both at the same time
they will get an error that tells them to specify either
physics_source
orbgc_source
, but not both at the same time. This may be confusing for the user because the behavior is opposite to what happens for the initial conditions. For the initial conditions you would not be able to treatphysics_source
andbgc_source
separately; if you only specified abgc_source
, but nophysics_source
, you would get an error.It would be great to get your opinions on what option you think is best @dafyddstephenson @TomNicholas @ubbu36 @abigale-wyatt.
Whatever option we decide on, I will implement the same for the surface forcing, even though we don't have the problem there that the time variable name is shared among physical and BGC variables. But I think it will be good to treat the
SurfaceForcing
class the same way as theBoundaryForcing
class, to not confuse the user even more.