Consider alternatives for the terms defined in the glossary

davidhassell commented 3 years ago

Now that there is a glossary of terms (#9), we can discuss if those terms have the most appropriate names.

Here's what we currently have:

aggregated data
The data of an aggregated variable that exists as a set of instructions on how to build an array from one or more other arrays stored elsewhere.

aggregated variable
A netCDF variable that does not contain its own data, rather it contains instructions on how to create its data as an aggregation of data from other sources.

fragment
An independent, possibly self-describing, array that defines a contiguous part of the aggregated data. The aggregated data is composed from a multi-dimensional orthogonal array of fragments.

fragment dimension
A dimension of the multi-dimensional orthogonal array of fragments that defines the aggregated data.

parent file
The netCDF file that contains the aggregated variable, and may also contain some or all of the fragments.

bnlawrence commented 3 years ago

I think you wanted comments on the terms: for me,

"aggregated data" implies "has been aggregated" where as the usage here is "set of instructions on how". I wonder if "aggregation view" might not be better, or "aggregation data?
similarly, "aggregation variable" rather than "aggregated variable"?
fragment and fragment dimension are consistent with usage elsewhere (e.g. ESDM)
parent "file"? Do we have a concept of the parent separate to it being serialised and (presumably) in storage?

JonathanGregory commented 3 years ago

I agree with Bryan about this difficulty and his proposal to replace "aggregated" with "aggregation". J

"aggregated data" implies "has been aggregated" where as the usage here is "set of instructions on how". I wonder if "aggregation view" might not be better, or "aggregation data?

similarly, "aggregation variable" rather than "aggregated variable"?

davidhassell commented 3 years ago

Hi @bnlawrence

parent "file"? Do we have a concept of the parent separate to it being serialised and (presumably) in storage?

I'm not 100% sure what you mean here, but I'm thinking that this concept of a "parent file/dataset/thing" is not really needed, after all.

When a fragment lives in the parent file/dataset/thing (this'll be a tough sentence to parse, I think!) it's really just a special case where the fragment file happens to coincide with one of the files, excluding fragment files that compose the dataset containing the aggregated variable.

This careful wording is meant to cover us from the case when an external aggregated cell measures variable fragments in the file of the related data variable.

I have submitted PR #16 with some first ideas on how this could work

davidhassell commented 3 years ago

Hi @bnlawrence and @JonathanGregory - some more comments:

"aggregated data" implies "has been aggregated" where as the usage here is "set of instructions on how". I wonder if "aggregation view" might not be better, or "aggregation data?

Hmm. I managed to define it twice, contradicting myself:

From the first paragraph:
"When created by an application program, the data of an aggregation variable is called its aggregated data."

From the glossary:
"The data of an aggregation variable that exists as a set of instructions on how to build an array from one or more other arrays stored elsewhere."

I probably prefer the first definition, and a re-worked glossary entry of "The data of an aggregation variable, as created by an application program from one or more fragments."

I think that the concept of the data as a created entity naturally lends itself to the detailed descriptions that follow.

What do you think?

similarly, "aggregation variable" rather than "aggregated variable"?

Sounds good. I have updated PR #16 for this.

fragment and fragment dimension are consistent with usage elsewhere (e.g. ESDM)

Sounds good.

In the fragment description I said "An independent, possibly self-describing, array ...". Should the word "possibly" be removed?

parent "file"? Do we have a concept of the parent separate to it being serialised and (presumably) in storage?

See https://github.com/davidhassell/cfa-conventions/issues/13#issuecomment-832555613

JonathanGregory commented 3 years ago

Dear David

I agree that the first definition is more logical. So an aggregation variable contains aggregated data (once it's been realised)? What do you call the instruction data?

Best wishes

Jonathan

davidhassell commented 3 years ago

Dear Jonathan,

Thanks.

So an aggregation variable contains aggregated data (once it's been realised)?

Yes, that certainly makes sense if you imagine your variable as being an object in memory. Once the aggregated data has been realised, then you are in an identical situation that you would have been if the variable in the was instead a normal, non-aggregated variable.

Currently we don't have a formal name for the instructions, they're just a combination of the contents of the aggregated_dimensions and aggregated_data metadata attributes. I can't decide if we need to give that a name - what do you think?

JonathanGregory commented 3 years ago

I suppose that depends on whether you find yourself wanting to refer to it (the instructions). In that case it needs a name to answer to.

davidhassell commented 3 years ago

OK, thanks - sounds fair enough. Right now I don't think that I've had the need to refer to it, but I'll review the doc and see if there are any cases where such a reference could perhaps have been used to improve readability.

NCAS-CMS / cfa-conventions

Consider alternatives for the terms defined in the glossary #13