csdms / bmi

The Basic Model Interface is a standardized set of functions allowing coupling of models to models and models to data
https://bmi.readthedocs.io
MIT License
49 stars 17 forks source link

Pre-submission comments on BMI docs #58

Closed gregtucker closed 4 years ago

gregtucker commented 4 years ago

A few comments and suggested edits on BMI docs:

  1. "If the model’s state variables don’t change in time, then they can be computed by the initialize function and this function can just return without doing anything": Would an appropriate alternative design pattern, for situations where there is no time-stepping but a user might conceivably want to re-run a component (for example, after changing one or more variables using set_value) be to have update() re-run the calculations? If so, it's probably worth stating that after the above sentence in the docs.

  2. get_input_item_count(): might be worth adding that this is the # of variables that can be set via set_value (if that's correct); similarly for get_value on get_output_item_count

  3. "A model may have no input [or output] variables": could be mis-read as meaning "may not" equals "not allowed"; suggest "might" instead of "may", and/or adding something like "in which case the return value is empty"

  4. get_var_units: is there an identifier for units that vary depending on context? If so, might be worth a mention (even just to note that this is also covered in UDUNITS, if it is)

  5. get_var_location: the 3 possible returns don't include a scalar (i.e., single value), which makes me think that the variable information functions apply only to grid-based variables. So you could not, for example, use get_value to query the value of a single-valued parameter. I could imagine users getting confused about this, so the working definition of variable is probably worth explaining in the intro to the Variable Information Functions section.

  6. "If the model doesn’t define an end time, a large number (e.g., 1.0e6) is typically chosen." - This makes sense, but I can imagine situations where this could cause issues---e.g., if a model's time units are seconds and it runs for more than a year (~pi x 10^7 sec). Would it make sense to suggest a default equal to the highest floating point number or something like that? Or maybe just an even bigger number: IEEE double-precision max exponent, according to wikipedia, is 308, so could suggest 1.0e308.

  7. "Avoid using years as a unit, if possible, since a year is difficult to define precisely." - a lot of geologic-time models use years... Does UDUNITS define different forms of year, e.g., 365 days, 365.25 days, or a certain number of seconds for a proper astronomical year? If so, could suggest that here as an alternative for those models that insist on using years.

  8. get_value: people might be confused by the reference to "buffer"; call it "the array parameter" instead?

  9. "One grid could be a uniform rectilinear grid on which temperature is defined. A second grid could be a scalar, on which a constant thermal diffusivity is defined." - oh, that's cool! So maybe that's the answer to the question about scalar parameters: define a second grid that contains such-and-such variables. Not sure whether that's a practice we want to recommend, though...?

  10. get_grid_type and following: "This function is needed for every grid type." - not sure what this means

  11. get_grid_rank: is this the same as the # of dimensions in the grid arrays, ie 1d, 2d, or 3d? is it zero for a scalar grid?

  12. get_grid_origin: what if you grid is 2d but its arrays are flat? Seems like it should be rank 1, but the origin should still have two values. I expect this would happen with unstructured grids, for example; might be worth a note somewhere about how this would normally be handled.

  13. The reader has no way of knowing that the cool material under "Additional Topics" exists until they get to the bottom of the function description section. Suggest moving these links, or copying them, under the heading Basic Model Interface, between the end of the text here and Table 2. One way to organize would be a sub-head "BMI Functions" for the existing text, and then "Additional topics" with the links.

  14. Grids: "The grid shape is the number of nodes..." - suggest "refers to the number of nodes", since the shape itself would be # rows and cols (eg). Or specify "number of rows and columns of nodes, as opposed to other types of element (such as cells or faces)"

  15. Great illustrations in the grid section!

  16. CSDMS Modeling Framework: great to refer to this, but I suspect if someone searched for what this is and where to try it out, they wouldn't find anything, because they don't know that CMF basically means "pymt" (these days). How about a sentence pointing them to pymt and its tutorials?

  17. Also, suggest adding a "How to contact us for help incorporating a component" type of thing at the end of the CMF page.

mdpiper commented 4 years ago

I've numbered @gregtucker's comments, just to make them easier to reference as we work through the list.

mcflugen commented 4 years ago
  1. I think what's being said in the docs is actually something more along the lines of, "if a component doesn't have any input variables...". That is, there are no variables that a user can change through set_value. However, in the case of a component that does have input variables, then, definitely, a user could re-run the calculations with a call(s) to set_value followed by a call to update. If there is no time-stepping in the component, it's not clear what the get_time methods would return but, otherwise, I think it would operate just as a time-stepping component would.

  2. Yes, this returns the number of variables that can be set with set_value. It's also used to provide the length of the array used for get_input_var_names in languages like C. I'm not sure we even need this function for languages like Python or C++.

  3. Correct. A model may not have any input variables. However, I can't imagine a model that doesn't have any output variables. What's the point of a model that doesn't provide anything? Maybe an example could be a component that generates a plot or a movie or something like that? I guess there's not need to enforce that a model is required to have output variables.

  4. Are you thinking of something like "L" to represent "units of length" rather than "meters"? We don't have anything like that (and, to my knowledge, neither does UDUNITS). Thus far, we've assumed that get_var_units returned real units and that these units are assumed in get_value and set_value calls. I think it would get pretty confusing otherwise.

  5. I think a additional variable locations should be none (i.e. there is no grid), points (i.e. variables are just defined on nodes with coordinates but the nodes are not connected by edges).

  6. Maybe setting the end time to IEEE +inf? I'm not certain how general that would be between languages and platforms. An alternative could be to introduce a new method (has_end_time?) that indicates if there is an end time or not.

  7. In UDUNITS a "year" is about 365.25 days.

gregtucker commented 4 years ago

@mcflugen for (4) I was thinking situations where you have a formula like:

y = b x^c

where the units of b depend on the value of c. This comes up in empirical hydrology formulas, such as the oft-used expression for channel width (y) as a function of discharge (x). It's admittedly ugly to have coefficients with weird, exponent-dependent units, but seems to be common practice.

mdpiper commented 4 years ago
  1. I don't see how the model time would be reset without calling initialize. My sense is that if you want to rerun a model, it's best to, at minimum, finalize and reinitialize, but safer to create a separate instance of the model. Edit -- @mcflugen & I talked about this & I better understand now. He may add some text to docs.
  2. Added text to descriptions of get_input_item_count and get_output_item_count.
  3. Corrected.
  4. We can break this out as an issue / feature request (#60).
  5. I'll add this as an issue (#61) to be fixed in the next BMI release. For now, I've added text stating that for a scalar variable, the return from this function is ignored.
  6. Changed 1e6 (an Austin Powers reference!) to "the largest floating point number supported on your platform".
  7. Added the UDUNITS values for a year in days and seconds, for reference.
  8. Replaced "buffer" with "array parameter".
  9. Yup! To reinforce this point, I added a new item to the best practices list.
  10. Struck that text.
  11. Yup! I added some text to this effect, but also check the linked term "rank" in the glossary.
  12. get_grid_origin is only used for uniform rectilinear grids.
  13. I see your point. However, with the exception of the best practices doc, I think that all of these topics represent anciallary information. The grids section, as well as the glossary and references, are linked to throughout the BMI function descriptions. Also, I'm trying to soft-pedal references to CSDMS and pymt, because I'd like BMI to stand on its own, outside of our influence. I did add a sentence linking to the best practices doc.
  14. Used suggested correction.
  15. That's all Eric's artistry.
  16. Updated this section to refer to pymt instead of the CMF. Moved some info to the best practices doc.
  17. Added a Help section after Additional Topics on the main page.

I've gathered all these changes into a PR, #59.

gregtucker commented 4 years ago

+1