NCAS-CMS / cfa-conventions

NetCDF Climate and Forecast Aggregation (CFA) Conventions
https://github.com/NCAS-CMS/cfa-conventions/blob/main/source/cfa.md
1 stars 1 forks source link

JMG comments #2

Closed davidhassell closed 3 years ago

davidhassell commented 3 years ago

A record of some off-line conversations that led to this PR.

[JMG] index. I'm not sure I understand. Apparently, for every combination of fragment indices, you need to specify another set of fragment indices. Do you mean that the fragments, although forming a multidimensional array, can be completely jumbled up? If so, it would help to say so. Then, for all the other quantities, which set of indices are using? - before or after this index transformation.

[DH] Yes. Since the location, file, and address variables are dimensioned by the fragment dimensions, we know the index by definition. Therefore, the index variable can go.

[JMG] location. Why do we need to specify these for each fragment? I would have thought it would there would be one set of ranges along each dimension.

[DH] I think that the simplicity and convenience of making all of the "instruction" variables span the same dimensions is nice. However, I'd be happy either way. I'll leave it as it is, for now, though.

If and only if each fragment had the same hyper-cubic shape (e.g. (10, 10, 10, 10)) then they could be jumbled up, with judicious use of the locations, but I think that this special case should be disallowed. I.e. each fragment must occupy its "correct" position in the orthogonal multidimensional array of fragments.

[JMG] Can missing_data, _FillValue, scale and offset differ among the fragments? What happens to valid and actual ranges? What about data types? Are other attributes allowed on fragment variables (if netCDF), and if present are they ignored or used in some way?

[DH] These are excellent questions. The fragments can define their masked points and compression/packing in any way they like, and it is up to the parent variable to consolidate them into an array with well defined missing_data, _FillValue, valid and actual ranges. I think that compression and packing should be disallowed on aggregated arrays.

[JMG] file. If a fragment has fewer versions than others then the trailing dimension must be padded with missing values. Did you consider the alternative of having just one string (no trailing dimension) for each fragment, with a delimiter?

[DH] Tempting, though I'm not sure how to ensure that the delimiter is not a valid file name character (such as " " (space)).

[JMG] I agree that space would be unsafe. Is ASCII zero (NUL) used as the end-of-string delimiter for netCDF strings? If not, you could use that. Otherwise, there are several other ASCII control characters which could be used, such as 0A (LF). If someone uses that in a filename, they deserve any problems they encounter.

davidhassell commented 3 years ago

~If and only if each fragment had the same hyper-cubic shape (e.g. (10, 10, 10, 10)) then they could be jumbled up, with judicious use of the locations, but I think that this special case should be disallowed. I.e. each fragment must occupy its "correct" position in the orthogonal multidimensional array of fragments.~

~This is incorrect. The fragments can be in any order, since in fragment-space each fragment effectively has size one in every dimension, and the location term maps each fragment to a location in the aggregated data independently of the others.~

~Therefore we do no need to state that the fragments must be in any order.~

Edit: sorry - scrub this from the record!

davidhassell commented 3 years ago

Oh. I think I've been too hasty. Please ignore https://github.com/davidhassell/cfa-conventions/pull/2#issuecomment-826597506. Sorry!