cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
82 stars 43 forks source link

Use of "most rapidly varying" #530

Closed ChrisBarker-NOAA closed 2 weeks ago

ChrisBarker-NOAA commented 2 months ago

Clarify use of "most rapidly varying" dimension.

In (at least three) places in CF, we refer to the "most rapidly varying" dimension (and are thinking of adding a fourth, in the cell definition, discussed in #163.

I'm enough of a computer geek to know what this means, though I'm not (wasn't) sure quite how it applied to CF.

e.g. ""most rapidly varying" index to mean the one which varies by 1 for the addresses of adjacent locations in storage, i.e. the first index in Fortran, the last in C and CDL"

If, ion fact, it's always the last in CDL (and in netcdf itself), then I think this language could not only being confusing to folks less familiar with the intricacies of array store, but also send. people on the wring track if they are, e.g. writing a file with Fortran, and might think that "most rapidly varying" means the first index, as it is in Fortran.

The three places I found "rapidly varying"

Now that I've written this all out -- maybe the only thing to do is adjust the text in 7.1, which is bering worked on right now in #163 (PR #521)

However, maybe it would be good to put in the spec somewhere that "the most rapidly varying" dimension is always the last in a netcdf file? I'm sure that's defined in the netcdf spec itself, but having int in CF could be helpful.

NOTE: there may be other places to look at in the doc -- I only. found these three by searching "rapidly varying"

Moderator

TBA

davidhassell commented 2 months ago

Hi Chris,

Thanks for describing this so clearly! Would a new entry in 1.3 terminology be sufficient? e.g. something like most rapidly varying dimension: \<definition>, and then make sure we use that exact phrase elsewhere in the text:

I notice that in UGRID (which is also CF, now) there is sometimes the option to specify which dimension is the most rapidly varying, e.g.

The face_dimension attribute specifies which netcdf dimension is used to indicate the index of the face in the connectivity arrays. This is needed because some applications store the data with the fastest varying index first, and some with that index last. The default is to use the num_faces as fastest dimension; e.g. a (num_faces, 3) array for triangles, but some applications might use a (3, num_faces) order, in which case the face_dimension attribute is required to help the client code disambiguate. The edge_dimension attribute is similar for the edge connectivity arrays.

If I understand correctly, given that in CDL/netCDF the most rapidly varying dimension is the last one, this description is misleading. It implies that the face index is always the slowest varying dimension, but that it can be in either position. Not so, right?

JonathanGregory commented 2 months ago

Dear Chris and David

Thanks for addressing this issue. I agree with defining "most rapidly varying dimension" in 1.3 (David's suggestion) and I agree also with saying "last" in the text (Chris's suggestion). In addition, I suggest we should clarify "last" in the text. That is, I think we should say what we mean in more than one way, consistently each time. That's redundancy in the text, but ought to help with clarity so long as we maintain consistency. My proposal for the three cases is:

In 1.3, we could insert a definition like this:

most rapidly varying dimension: The dimension of a multidimensional variable which differs by unity (modulo dimension size) for elements that are adjacent in storage. When netCDF is represented in CDL, the most rapidly varying dimension is the last one e.g. x in float data(z,y,x). C and Python NumPy use the same order as C, also called "column-major order", but Fortran uses the opposite convention, also called "row-major order", so that when netCDF variables are accessed in Fortran the most rapidly varying dimension is the first one.

How's that?

Best wishes

Jonathan

taylor13 commented 1 month ago

lovely (and clear), from my perspective.

ChrisBarker-NOAA commented 1 month ago

This looks good to me, thanks!

One nit:

in 1.5 "...COARDS restricts the axis (equivalently dimension) ordering to be longitude, latitude, vertical, and time, with longitude being the last dimension in CDL order (the most rapidly varying dimension)."

Would't that be: (time, vertical, latitude, longitude) in CDL order? so a bit confusing to have it in the opposite order in the text. I know I follow an example before I carefully read the text! was COARDS originally written with Fortran in mind?

JonathanGregory commented 1 month ago

Those words have not changed, @ChrisBarker-NOAA, but I agree that it would be logical to put it in CDL order - good point. I don't know what software environment the authors of COARDS had in mind! Is this OK:

Since we would not be quoting COARDS verbatim, I have rephrased it, in the hope (though not the certainty) of making it clearer.

ChrisBarker-NOAA commented 1 month ago

Thanks! I think that's better, yes.

davidhassell commented 1 month ago

Hi,

This is looking good to me, thanks. A couple of questions:

JonathanGregory commented 1 month ago

Dear @davidhassell

Best wishes

Jonathan

ChrisBarker-NOAA commented 1 month ago

I like @davidhassell's wording :-)

JonathanGregory commented 3 weeks ago

More than three weeks have passed with no further comment. I have prepared PR #535 to implement these changes, as I drafted, with the subsequent changes by @ChrisBarker-NOAA and @davidhassell. Please could someone check and merge. Thanks.