Open aaraney opened 7 months ago
There are a few other edge cases that I will document in this issue in a bit. Need to run a few errands.
I need clarification on one of the situations that were noted, but there may not be a bug here.
The one I am uncertain of is when the main_output_variable
is set to a duplicated BMI output variable name used in multiple nested modules. That should work fine as long as all but one nested module has a mapped alias configured for said variable. That should make the "main" output come from the duplicated variable of the nested module without a configured alias for it.
@aaraney, I can't tell from the description whether something else is being observed. You mentioned "last" module, but there should only be one (at most) without an alias. Are you saying that whether and how (i.e., to which module) the alias was applied was irrelevant to what was actually used for main_output_variable
?
Regarding the other described scenarios:
main_output_variable
value is a required config param, so it missing or configured to something that doesn't match any of the available outputs should also produce an error.I think those are all that were mentioned, but let me know if I'm missing something.
Are you saying that whether and how (i.e., to which module) the alias was applied was irrelevant to what was actually used for
main_output_variable
?
Not exactly, I am saying that the top level main_output_variable
in a multi-bmi does not resolve alias names.
// ...
"main_output_variable": "main_output", // last module will be queried for this output variable name verbatim. alias is not translated
"modules": [
{
// ...
"main_output_variable": "output",
"variables_names_map": {
"input": "value",
"output": "main_output"
}
}
},
{
// ...
"main_output_variable": "output",
"uses_forcing_file": false,
"variables_names_map": {
"input": "value"
}
// ...
// ...
"main_output_variable": "output", // last module will still be queried output despite the presence of an alias
"modules": [
{
// ...
"main_output_variable": "output",
"variables_names_map": {
"input": "value_2"
}
}
},
{
// ...
"main_output_variable": "output",
"uses_forcing_file": false,
"variables_names_map": {
"input": "value",
"output": "dontuse"
}
// ...
I might be entirely wrong here, but isn't this intended? Multi-BMI, to my knowledge, was to pipe models together in a way that forms a tree of models -- whereas the above is attempting to output an ensemble of models? Again I might be completely wrong there
I might be entirely wrong here, but isn't this intended?
Fair point, I may be misinterpreting the intended functionality of a multi-bmi. In my mind, it seems that you should be able to create a pipe of models that has an overall main_output_variable
(top level in multi-bmi params
section) from any stage in the pipe. That might not be possible due to output aliases constraints though?
Other weird edge case that does not throw an error nor do what I would think it would. This example has two model:
pipe_model_2
that has one input, input
, and two outputs output
and output_2
. input
is copied to output
and output_2
.pipe_model
is the same as Ive been using in the previous examples. One input and output variable, input
and output
respectively. input
is copied to output
.Forcing contains two variables value
and value_2
. value
looks like 1,2,3,4,...; value_2
looks like 2,4,6,8,...
The odd thing is the catchment output. output_2
is not present. Adding output_variables
has no effect. Note that output
is from pipe_model
(the second module) not the first. This detail is consistent with the above examples.
edit: Moving "output_variables": ["output", "output_2"]
to multi bmi params
does work. ~However, still have the same problem of the "last module" wins. So if there are two modules that output, for example evapotranspiration and both modules use the same output variables name, if you want the evapotranspiration output from the module listed earlier in the modules
list, there doesnt appear to be a way to do that unless im missing something.~
Okay, so it seems that the solution is to specify the output variable alias names in the output_variables
mapping in the top level multi bmi params
section. The output_variables
does respect alias names, whereas it appears that the top level main_output_variable
does not. Additionally, it seems that the top level main_output_variable
has no effect on if a variable is included in the final catchment output(see last example). Looking at the BMI calls the framework is making, the main_output_variable
is queried, but not included in the output.
This might be documented somewhere, but i've not found it. The variables that are output by default in single formulation case vs a multi-bmi case differ. In a single formulation case, unless output_variables
is specified, all output variables are written as output. In the multi-bmi module case, only the last module's variables are written as output. The top level multi-bmi main_output_variable
must be any module's output variable, but it does not dictate what output is written. The only way it seems to have output variables written from multiple modules in the multi-bmi case is to specify a top level output_variables
(top level in multi-bmi params
section) and list the output variables / aliases to include. Im, not sure if this behavior is what is intended, if it is, we should more clearly document the differences.
@program-- and @robertbartel see the edit:
section of this issue.
Ok, @aaraney, I'm still remembering parts of how things work ...
Indeed, main_output_variable
doesn't have much to do with output files; it controls the returned value of the formulation's get_response
function: i.e., what's "output" when the framework advances modeling. The function returning a double
value goes back all the way to (at least) the original _Simple_Lumped_ModelRealization 4 years ago. I want to say that was the first _CatchmentFormulation implementation. Whether or not get_response
still should return a value like this is up for debate.
For multi-formulations, technically the get_response
"main" return value will come from the module at the primary_module_index
. The is default, set by Bmi_Multi_Formulation::create_multi_formulation
, is the last index. While there's a set_index_for_primary_module
setter, I'm fairly sure it's never used anywhere (i.e., that only the default of "last module" is ever used). One way or another, that's probably technically a bug (and it probably is still there because that functionality hasn't really mattered).
I might be entirely wrong here, but isn't this intended? Multi-BMI, to my knowledge, was to pipe models together in a way that forms a tree of models -- whereas the above is attempting to output an ensemble of models? Again I might be completely wrong there
It's a long time ago, but I think the original intent was to be able to modularly compose modeling. If not originally, it was not long thereafter a desired capability to be able to compose not just the process modeling, but also the per-time-step results (i.e., data output to files), from any output variables within the composition. I think that's less of a tree and more of an ensemble (though I feel like "ensemble" often has contextual implications, and I'm not sure if any are intended here). That said, the behavior around main_output_variable
does work differently, which is another good reason to consider whether it still belongs.
I think that's less of a tree and more of an ensemble (though I feel like "ensemble" often has contextual implications, and I'm not sure if any are intended here).
@robertbartel Haha yeah sorry, I should've been more specific with what I meant by tree and ensemble. For trees, what I meant is multiple models that are composed (in the way you're describing), such that the "tree" of models represents 1 single model
flowchart LR
A --> C
B --> C
D --> E
E --> F
C --> F
where each model passes its outputs to the next model, and F's outputs are the final outputs of the multi-bmi model. i.e. a pipeline of models representing a single model
For an ensemble:
flowchart LR
A --> Output
B --> Output
c --> Output
where each model in the ensemble passes their outputs to catchment file, so the output catchment file would have all outputs from each model, and they may/may not interact with one another. I think this is what the intention of making the formulations
config option an array was for?
As always, thanks for the in-depth response, @robertbartel!
Whether or not
get_response
still should return a value like this is up for debate.
Yeah, I went down a rabbit hole on Friday to better understand the implementation of the catchment to nexus aggregation specifically looking in Layer::update_models()
. There definitely seems to be some cruft that we've accumulated in this section of the codebase (e.g. magic numbers and undocumented assumptions). At a minimum, we should bolster the documentation around main_output_variable
. Specifically, what are the implications, assumptions, and invariants necessary to be the output of get_response()
? More broadly, since we have been talking about / working on layer mechanics, it seems like a viable time to consider the necessity of get_response
returning a value. Pinging @hellkite500 and @donaldwj just so you have this on your radar.
where each model passes its outputs to the next model, and F's outputs are the final outputs of the multi-bmi model. i.e. a pipeline of models representing a single model
@program--, I think you are right on with that description. However, I don't think that implies that the last model's output is what is written to disk. It could be that the processes each model represents necessitate ordering the models in a realization config file such that the last model does not output the variable you are interested in. Of course we could create some kind of sink model that takes as input the model output that you want, but I would hope we can find a more longterm solution.
I think this is what the intention of making the
formulations
config option an array was for?
Yeah, I don't know that the vision was for the framework to handle "creating" aggregate ensembles (so the actual aggregation of multiple formulations
into a single output). But instead running an ensemble of model formulations
. To be fair the domain language around ensemble is pretty fuzzy. Its common to use ensemble to mean both an aggregation of multiple models into a single forecast or multiple forecasts that originate from a common initialization time (think spaghetti plot). @hellkite500, can you speak to that?
@program--, thanks for the explanations. Unfortunately, things are coupled a bit more loosely than that. It is in some ways both, but strictly neither of those things.
Output file generation is controlled by the formulation. The formulation has access to output variables from the full ensemble of nested modules, but it selects what from the ensemble to include in output according to the configuration.
The formulation defers to its nested modules to actually perform the modeling, and it processes them in a particular order. The formulation can provide any of the outputs of any of its nested modules to serve as an input value for any of its nested modules. Typically that looks something like this, which essential fits under the tree category:
flowchart LR
A --> B
B --> C
A --> C
C --> D
B --> D
However, it can also look like this:
flowchart LR
A --> B
B --> C
A --> C
C --> A
C --> D
B --> D
Where the C to A edge spans across the boundary of the current and the previous time step; i.e., if C` represents module C of the previous time step, something like this:
flowchart LR
A --> B
B --> C
A --> C
C` --> A
C --> D
B --> D
That's still kind of a tree, but not in the domain of what I think you had in mind.
@robertbartel Huh, that's interesting. Thank you for that explanation! I realize I should've probably said DAG instead of tree, but you got what I meant 😄.
I didn't consider a formulation having a back edge in the way you described C` and C. How would that be modeled though? For example, the base case for A when C has no previous timestep?
Yeah, I don't know that the vision was for the framework to handle "creating" aggregate ensembles (so the actual aggregation of multiple formulations into a single output). But instead running an ensemble of model formulations
So the formulation definition being a list is a pre-cursor to supporting this type of functionality. The easiest case is definitely just running multiple formulations and each generating their own output (the spaghetti plot). However, once this capability exists, it wouldn't be too challenging, I think, to add a configuration option that allows some form of ensemble aggregation to occur during the model run, supporting some common aggregation mechanisms like "mean, median" ect and use that throughout the simulation as well, if that is a useful mechanic (I suspect it will be once we get there).
@hellkite500, exactly. It just comes down to what you want to support and what is in scope for framework capabilities.
I didn't consider a formulation having a back edge in the way you described C` and C. How would that be modeled though? For example, the base case for A when C has no previous timestep?
@program--, great question! You should specify the initial input in a top-level default_output_values
block. This is where it gets tricky and where we need more documentation. NextGen can, in some cases, provide a default value (e.g. 0.0
for double
s) even when an input is not specified in default_output_values
. However, its not as clear about what datatypes are supported and what their defaults are. I think @robertbartel is going to clarify this in a docs PR soon.
I think I found a case where the framework should throw?
Here is the setup and what should be happening graphically:
flowchart LR
subgraph f [forcing]
value
value_2
end
subgraph m1 [module 1]
input[input: value]
input_2[input_2: value_2]
output[output: output_2]
output_2[output_2: output]
end
subgraph m2 [module 2]
input_m2[input: output]
output_m2[output: output_sink]
end
value --1,2,3 --> input
value_2 --2,4,6 --> input_2
input --> output
input_2 --> output_2
output_2 --> input_m2
input_m2 --> output_m2
output_m2 --> output_sink
in variable_names_map
form
"variables_names_map": {
"input": "value",
"input_2": "value_2",
"output": "output_2",
"output_2": "output"
}
// ...
"variables_names_map": {
"input": "output",
"output": "output_sink"
}
However, the catchment output file has the series 1,2,3...
instead of 2,4,6,...
.
It seems sensible for the framework to throw if an output aliases an existing output in the same module. It appears to just ignore it.
Configuring a multi bmi realization with two modules that share a common output variable name can produces errors or unexpected results. Similarly, if the outer multi bmi module's
main_output_variable
is the shared common output variable name, the last module listed in themodules
section wins. Meaning the last module's output variable with the name associated with the multi bmi'smain_output_variable
section will be the output.To explore this problem I created a simple BMI python module with a single
input
scalar variable and a singleoutput
scalar variable. Theoutput
is simply assigned toinput
. I put the module "back to back" in a multi bmi realization (see below realization configs) to show case these issues.Failing to remapping either module's output name produces the following error:
realization file
```json5 { "global": { "formulations": [ { "name": "bmi_multi", "params": { "model_type_name": "", "allow_exceed_end_time": true, "main_output_variable": "output", // which module's `output` should this be? Currently it is always the last. "init_config": "/dev/null", "uses_forcing_file": false, "modules": [ { "name": "bmi_python", "params": { "python_type": "model.BmiA", "model_type_name": "BmiA", "name": "python_bmi_1", "init_config": "green", "allow_exceed_end_time": true, "main_output_variable": "output", "uses_forcing_file": false, "variables_names_map": { "input": "DLWRF_surface" } } }, { "name": "bmi_python", "params": { "python_type": "model.BmiA", "model_type_name": "BmiA", "name": "python_bmi_2", "init_config": "blue", "allow_exceed_end_time": true, "main_output_variable": "output", "uses_forcing_file": false, "variables_names_map": { "input": "DLWRF_surface" } } } ] } } ], "forcing": { "file_pattern": "cat-27.csv", "path": "./data/", "provider": "CsvPerFeature" } }, "time": { "start_time": "2015-12-01 00:00:00", "end_time": "2015-12-01 23:00:00", "output_interval": 3600 } } ```Error with remapping but not correctly specifying multi bmi
main_output_variable
(not the most helpful b.c. its a symptom):realization file
```json5 { "global": { "formulations": [ { "name": "bmi_multi", "params": { "model_type_name": "output", "allow_exceed_end_time": true, "main_output_variable": "", // <- note this is not set. this is where the empty string comes from in the error. "init_config": "/dev/null", "uses_forcing_file": false, "modules": [ { "name": "bmi_python", "params": { "python_type": "model.BmiA", "model_type_name": "BmiA", "name": "python_bmi_1", "init_config": "green", "allow_exceed_end_time": true, "main_output_variable": "output", "uses_forcing_file": false, "variables_names_map": { "input": "DLWRF_surface", "output": "output_1" } } }, { "name": "bmi_python", "params": { "python_type": "model.BmiA", "model_type_name": "BmiA", "name": "python_bmi_2", "init_config": "blue", "allow_exceed_end_time": true, "main_output_variable": "output", "uses_forcing_file": false, "variables_names_map": { "input": "DLWRF_surface" } } } ] } } ], "forcing": { "file_pattern": "cat-27.csv", "path": "./data/", "provider": "CsvPerFeature" } }, "time": { "start_time": "2015-12-01 00:00:00", "end_time": "2015-12-01 23:00:00", "output_interval": 3600 } } ```Hopefully this will be more helpful in showcasing what is occurring. In the below details section are the BMI function calls made by each module leading up to the error (kind of a poor mans stack trace). They are labeled 1 and 2 respectively following their ordering in the realization config file.
BMI module function calls
```shell NGen Framework 0.1.0 Building Nexus collection Building Catchment collection Initializing formulations 1: initialize('green') 1: get_time_units() 1: get_start_time() 1: get_output_item_count() 1: get_output_var_names() 1: get_input_item_count() 1: get_input_var_names() 1: get_output_item_count() 1: get_output_var_names() 2: initialize('blue') 2: get_time_units() 2: get_start_time() 2: get_output_item_count() 2: get_output_var_names() 2: get_input_item_count() 2: get_input_var_names() 2: get_output_item_count() 2: get_output_var_names() Building Feature Index Catchment topology is dendritic. Running Models Running timestep 0 1: get_current_time() 1: get_current_time() 1: get_current_time() 1: get_input_item_count() 1: get_input_var_names() 1: get_var_nbytes('input') 1: get_var_itemsize('input') 1: get_var_type('input') 1: get_var_units('input') WARN: Unit conversion unsuccessful - Returning unconverted value! ("Unable to parse out_units value -") 1: get_var_itemsize('input') 1: get_var_type('input') 1: get_var_nbytes('input') 1: get_var_itemsize('input') 1: set_value('input', array([361.30001831])) 1: get_time_step() 1: update() 1: time stepped 1 1: get_var_type('output') 1: get_var_itemsize('output') 1: get_var_type('output') 1: get_var_itemsize('output') 1: get_input_item_count() 1: get_input_var_names() 1: get_var_type('output') 1: get_value_at_indices('output', array([0.]), array([0], dtype=int32)) 2: get_current_time() 2: get_current_time() 2: get_input_item_count() 2: get_input_var_names() 2: get_var_nbytes('input') 2: get_var_itemsize('input') 2: get_var_type('input') 2: get_var_units('input') WARN: Unit conversion unsuccessful - Returning unconverted value! ("Unable to parse out_units value -") 2: get_var_itemsize('input') 2: get_var_type('input') 2: get_var_nbytes('input') 2: get_var_itemsize('input') 2: set_value('input', array([0.])) 2: get_time_step() 2: update() 2: time stepped 1 2: get_var_type('output') 2: get_var_itemsize('output') 2: get_var_type('output') 2: get_var_itemsize('output') 2: get_input_item_count() 2: get_input_var_names() 2: get_var_type('output') 2: get_value_at_indices('output', array([0.]), array([0], dtype=int32)) 2: get_var_type('') libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: KeyError: '' ```python bmi module source
To use the realization files included above, you will need to create a dir structure of `model/bmi.py`, setup a virtual environment, install `bmipy`, and that should mostly work. ```python import numpy as np from bmipy import Bmi from typing_extensions import override class BmiA(Bmi): _name = "A" _input_var_names = ("input",) _output_var_names = ("output",) _time_step_size = 3600 # sec in hour def __init__(self): input = np.array([0], dtype=np.float64) unit = "-" self._values = { "input": input, "output": input, } self._var_units = { "input": unit, "output": unit, } self._var_loc = { "input": "node", "output": "node", } self.started = False self._time_step = 0 self._start_time = 0.0 self._end_time = np.finfo("d").max self._time_units = "s" def _run_timestep(self): self._time_step += 1 print(f"time stepped {self._time_step}") @override def initialize(self, config_file: str) -> None: ... @override def update(self): self.update_until(self._time_step_size + self.get_current_time()) @override def update_until(self, time: float) -> None: self.started = True n_steps = (time - self.get_current_time()) / self.get_time_step() for _ in range(int(n_steps)): self._run_timestep() @override def finalize(self): ... @override def get_var_type(self, name: str) -> str: return str(self.get_value_ptr(name).dtype) @override def get_var_units(self, name: str) -> str: return self._var_units[name] @override def get_var_nbytes(self, name: str) -> int: return self.get_value_ptr(name).nbytes @override def get_var_itemsize(self, name: str) -> int: return np.dtype(self.get_var_type(name)).itemsize @override def get_var_location(self, name: str) -> str: return self._var_loc[name] @override def get_value_ptr(self, name: str) -> np.ndarray: return self._values[name] @override def get_value(self, name: str, dest: np.ndarray) -> np.ndarray: dest[:] = self.get_value_ptr(name).copy().flatten() return dest @override def get_value_at_indices( self, name: str, dest: np.ndarray, inds: np.ndarray ) -> np.ndarray: dest[:] = self.get_value_ptr(name).take(inds) return dest @override def set_value(self, name: str, src: np.ndarray) -> None: val = self.get_value_ptr(name) val[:] = src.copy().reshape(val.shape) @override def set_value_at_indices( self, name: str, inds: np.ndarray, src: np.ndarray ) -> None: val = self.get_value_ptr(name) val.flat[inds] = src.copy() @override def get_component_name(self) -> str: return self._name @override def get_input_item_count(self) -> int: return len(self._input_var_names) @override def get_output_item_count(self) -> int: return len(self._output_var_names) @override def get_input_var_names(self) -> tuple[str]: return self._input_var_names @override def get_output_var_names(self) -> tuple[str]: return self._output_var_names @override def get_start_time(self) -> float: return self._start_time @override def get_end_time(self) -> float: return self._end_time # type: ignore @override def get_current_time(self) -> float: return self._time_step * self._time_step_size @override def get_time_step(self) -> float: return self._time_step_size @override def get_time_units(self) -> str: return self._time_units @override def get_grid_shape(self, grid: int, shape: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_shape") @override def get_grid_spacing(self, grid: int, spacing: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_spacing") @override def get_grid_origin(self, grid: int, origin: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_origin") @override def get_grid_type(self, grid: int) -> str: raise NotImplementedError("get_grid_type") @override def get_var_grid(self, name: str) -> int: raise NotImplementedError("get_var_grid") @override def get_grid_rank(self, grid: int) -> int: raise NotImplementedError("get_grid_rank") @override def get_grid_size(self, grid: int) -> int: raise NotImplementedError("get_grid_size") @override def get_grid_edge_count(self, grid: int) -> int: raise NotImplementedError("get_grid_edge_count") @override def get_grid_edge_nodes(self, grid: int, edge_nodes: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_edge_nodes") @override def get_grid_face_count(self, grid: int) -> int: raise NotImplementedError("get_grid_face_count") @override def get_grid_face_nodes(self, grid: int, face_nodes: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_face_nodes") @override def get_grid_node_count(self, grid: int) -> int: raise NotImplementedError("get_grid_node_count") @override def get_grid_nodes_per_face( self, grid: int, nodes_per_face: np.ndarray ) -> np.ndarray: raise NotImplementedError("get_grid_nodes_per_face") @override def get_grid_face_edges(self, grid: int, face_edges: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_face_edges") @override def get_grid_x(self, grid: int, x: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_x") @override def get_grid_y(self, grid: int, y: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_y") @override def get_grid_z(self, grid: int, z: np.ndarray) -> np.ndarray: raise NotImplementedError("get_grid_z") ```edit: I no longer think this is strictly a bug, there could be unintended behavior, but that is up for debate. Instead, I did not understand the behavior of multi-bmi vs a single formulation and that led to confusion. At a minimum, I will contribute documentation to clarify the behavioral differences.
My confusion stemmed from the differences in what is output to disk (catchment output file) in a single formulation case vs a multi-bmi case. The variables that are output by default in single formulation case vs a multi-bmi case differ.
Single formulation: all output variables are written as output unless
output_variables
is specified. Ifoutput_variables
is specified, the set of output variables written can be limited.multi-bmi module: only the last listed module's output variables in the
modules
list are written as output. If the last listed module specifiesoutput_variables
, the set of output variables (must use output name not alias name) written can be limited just as in single formulation case.modules
in a multi-bmi formulation, you must list all desired output variables or their alias equivalent name in the multi-bmi configsoutput_variables
section (e.g.global.formulations[0].params.output_variables
).Similarly, a multi-bmi formulation's
main_output_variable
does not control if a value is output for not. It is valid to specify amain_output_variable
that is not written as output. I don't know the functional purpose ofmain_output_variable
, however for NextGen to run, a valid output variable name (not alias) from any module in the module's list must be provided.