Closed verseve closed 12 months ago
Thanks for tackling these issues.
I have found one more (possible) deviation from BMI in Wflow.jl: get_var_type
usually returns the type of the elements of the array/vector.
In the bmi-example-julia this is implemented as:
repr(eltype(m.temperature))
However, in Wflow.jl this is implemented as
string(typeof(param(model, name)))
So instead of Float64
, the returned string will be Vector{Float64}
.
Thanks for spotting this!
Yes, this should be eltype
instead of typeof
in Wflow.jl.
I think we need to decide what model grids we want to expose through BMI, with the following two options:
Finally, some variables in Wflow have an extra (3rd) dimension, for example classes
for flextopo
and layer
for sbm
(internally these are stored as a Vector of SVectors. I suggest we expose these variables as 2D (and not 3D) through BMI, and could do this in the same way as the wflow Python version (with adding an underscore and index to the variable name, e.g.: vertical.vwc_1
).
I'm a bit torn on this. My first thought was that the grids need to be unstructured to have a 1:1 mapping for performance reasons. Though from a usability perspective unstructured is really quite a bit more difficult, and as you say in our output files we also act like it is a 2D model. The BMI grid implementation is just not so well suited for what we do internally. It is a structured grid, we just don't store it as such internally.
I suggest we go for option 2. I think copying data from 1D internal to 2D BMI is not too costly if we do that for every timestep, provided we re-use the 2D memory. We can measure it first to be sure.
I suggest we expose these variables as 2D (and not 3D) through BMI
Is that assuming picking option 2? And "1D (and not 2D)" for option 1? Why do you prefer this? If we go for option 2, I think offering 3D grids to BMI here is the most obvious choice.
Thanks for your input @visr !
You are probably aware of this: Arrays are always flattened in BMI. A user always needs to convert the data (1D Array) to a 2D Array if this is required, for both option 1 and 2. But I agree, from a user perspective an unstructured grid with x
, y
(and z
) coordinates can be more difficult to deal with.
Below a bit more information on the different BMI grid types that are required for option 1 and 2 (including 3D grids).
If we go for option 1, the following (unstructured) grid types are required:
So, a total of 10 different grid types are required for option 1.
For option 2, the components kinematic wave (river, land and subsurface), vertical, lake and reservoir can be stored on a structured grid. If we go for option 2, I think we need the following grid types:
So, a total of 6 grid types. I think it is preferred to store the local inertial model as an unstructured grid (although this is not (yet) part of Wflow output), it is more explicit and not depending on certain assumptions (like a variable stored at a grid node is actually stored at a downstream edge), which should also make model couplings with these components more straightforward.
Is that assuming picking option 2? And "1D (and not 2D)" for option 1? Why do you prefer this? If we go for option 2, I think offering 3D grids to BMI here is the most obvious choice.
This would be the case for both option 1 and 2. I am suggesting this, because 3D data/grids make up a small part of Wflow, users are probably mostly interested in a subset of this data (for example soil moisture for only the upper soil layer and not 4 layers), and it would require less grid types. I am also fine with offering 3D data through BMI (as we also do as part of Wflow output).
I am also a bit torn on this. In the end I think option 2 is cleaner (and probably closer to how BMI is intended), and I agree, it is probably not too costly (compared to option 1). In addition, a possible risk with option 1 is that we end up with many different grid types over time, when more variables are added to Wflow.
Arrays are always flattened in BMI. A user always needs to convert the data (1D Array) to a 2D Array if this is required, for both option 1 and 2.
I read about flattening in the BMI best practices, and actually I don't really understand it, at least in the context of Julia. I wrote about it in these notes starting from the best practices quote. If flattening is just a reshape/reinterpretation of the same memory, how does flattening a vector avoid row/column major confusion? And in a julia-to-julia BasicModelInterface, I don't see any benefit in flattening, it just destroys the true shape information, but is otherwise functionally the same, especially with Julia's linear indexing into multi dimensional arrays. So in the bmi-example-julia I just returned a Matrix, since I didn't see the point in calling vec(a)
first. I could be missing something there though, then we should update it.
@BSchilperoort maybe you also have some ideas on this, since you also mentioned a flat vector should be returned by BMI.get_value
. Would be good to address these open questions from the BasicModelInterface.jl README.md.
@BSchilperoort maybe you also have some ideas on this, since you also mentioned a flat vector should be returned by BMI.get_value.
The main reason I care about that, is that this is the BMI specification. Exactly following the specification allows others to write code that should work on any model implementing a BMI (e.g. eWaterCycle and grpc4bmi).
For example, in eWaterCycle we have a routine "get_value_as_xarray" which wraps the BMI functions and returns the data in an xarray DataArray. This only works if there is a standard across languages.
in a julia-to-julia BasicModelInterface, I don't see any benefit in flattening, it just destroys the true shape information, but is otherwise functionally the same, especially with Julia's linear indexing into multi dimensional arrays.
Yes, for these cases it's often a hassle to flatten. Same with for example the dest argument being required: this doesn't really make sense for languages such as Python or Julia. However, not all languages work the same and BMI can also be used to couple across languages.
So in the bmi-example-julia I just returned a Matrix, since I didn't see the point in calling vec(a) first. I could be missing something there though, then we should update it.
I had not noticed that yet, but that should then be updated, yes.
In addition, a possible risk with option 1 is that we end up with many different grid types over time, when more variables are added to Wflow.
In my opinion fewer grid types would be better, as I think it will get confusing to users of the BMI quite fast.
Ok yes I was struggling to come up with an example in Julia where the Vector/Matrix difference matters, but when PythonCall.jl converts a Matrix to a 2D numpy array then indeed that can cause issues. Updated the example in https://github.com/csdms/bmi-example-julia/pull/5.
Ok, but for the BMI uniform rectilinear grid type, you need to know if the flattened array is from a row or column major language (the node location functions are not listed as part of the uniform rectilinear grids)? Or am I missing something?
Yes this part is also not that clear to me yet. The BMI best practices state.
It’s the developer’s responsibility to ensure that array information is flattened/redimensionalized in the correct order.
The summary table states what dimensions 1, 2 and 3 should be. If that needs to be true for both row and column-major languages, the data needs to be permuted, which is expensive. If say we want to write to GeoTIFF from Julia we often use (y,x) rather than (x,y) such that the row-major interpretation of that is (x,y) again.
Function | Description -- | -- get_grid_x | Get the locations of a grid’s nodes in dimension 1. get_grid_y | Get the locations of a grid’s nodes in dimension 2. get_grid_z | Get the locations of a grid’s nodes in dimension 3.Ok, but for the BMI uniform rectilinear grid type, you need to know if the flattened array is from a row or column major language (the node location functions are not listed as part of the uniform rectilinear grids)? Or am I missing something?
As far as I can see, this is sadly not specified (yet). There is a discussion about this topic here: https://github.com/csdms/bmi/issues/118
After reading through issue https://github.com/csdms/bmi/issues/118, I am leaning to use unstructured grid types in Wflow for now, at least until this issue has been resolved. I agree that this grid type can be more difficult from a user perspective, on the other hand a student was able to couple Modflow 6 and Wflow without too many issues a couple of years ago. As indeed fewer grid types would be better, I think we need only two grid types:
Although fewer grid types would be better from a user perspective, at the moment it does not seem worth to support this with the use of a buffer (@visr and I concluded this after a brief discussion). For example. to support reservoirs as part of a river grid type, a reference (for example with Base.Ref
) within the buffer to the reservoir variables is required (BMI function get_value_ptr
), or an index mapping between the buffer and reservoir variables for set_value
and set_value_at_indices
is required (in that case the buffer does not contain references, but this is probably not a valid implementation of get_value_ptr
).
For the use of fewer grid types it makes more sense to solve this at the Model
level directly, and not partly in the BMI layer and partly in Wflow. This can be part of another issue/PR.
Hi Bart (@BSchilperoort), do the code changes in branch BMI now work fine for eWaterCycle (Albrecht got the impression it is working now during a workshop with escience this week)? The branch will also go through an internal code review, but would be good to also have some feedback from your side.
Hi Willem, as far as I know the BMI is now complete and compliant with BMI 2.0. It worked great during the workshop. Thank you for implementing these fixes! If we encounter anything else at some point we'll open up an issue.
The only thing not working yet for eWaterCycle is putting the model in a container and communicating with it through gRPC, as a Julia native implementation of a gRPC server does not exist. Trying to pass the BMI through Python or C has led to segfaults and other issues. We'll still have to work on this, and possibly use a different communication protocol. However, this is not related to this PR or any code in Wflow.jl.
Wflow version checks
[X] I have checked that this issue has not already been reported.
[X] I have checked that this bug exists on the latest version of Wflow.
Reproducible Example
Request of grid type seems wrong for some variables. As an example, for
lateral.subsurface.to_river
: request of grid type is incorrect. With BMI.get_var_grid, grid=2 is returned. However, the Vector that is returned with BMI.get_value has the size of grid=4.Current behaviour
BMI.get_value deviates from the BasicModelInterface.jl implementation (
dest
argument is missing). Grid type for some variables seems wrong (see also Reproducible Example). BMI.get_value should always return a flattened array, this is not the case for variables with an extra dimension (e.g.layer
), then a Vector of SVectors is returned.Desired behaviour
BMI.get_value should have
dest
as argument. Grid type should be correct for each variable. This has already been resolved in the zmq_server branch (coupling with OpenDA), but good to include in a separate branch that addresses this issue. BMI.get_value should always return a flattened array (probably need to make use of get_grid_z).Additional Context
These BMI issues were reported during an update of eWaterCycle to support Wflow.jl.