cf-convention / vocabularies

Issues and source files for CF controlled vocabularies
3 stars 1 forks source link

Standard name "namespaces" #26

Open larsbarring opened 2 years ago

larsbarring commented 2 years ago

Here is a thought sparked off by some recent (and earlier) issues: Might it be useful to add some kind of "namespace" capability to the standard name machinery?

Example 1: In cf-convention/vocabularies#25 there is a request for standard names clearly distinguishing between fresh water variables and the currently existing sea variables. The discussion in that issue shows that this is not as easy as it may seem because that CF in this context has defined "sea" to also include fresh water bodies.

Example 2: In cf-convention/vocabularies#142 standard names for wildfire variables are requested. Some of these are well established and widely used in many countries, others are more specific to a specific region/application/organisation. For the latter the solution suggested is to add what essentially amounts to a "namespace identifier" at the beginning of the standard name.

Example 3: In my own work on various climate indices a core set was established by WMO supported expert teams (ETCCDI and ET-SCI). Several of these indices have very well established names while it more more difficult to find a suitable standard name.

The first example shows that as earth system modelling ("climate modelling") becomes more sophisticated over the years, concepts established at one time need to be refined to become more fine grained. And this may be difficult as the standard name machinery is now set up.

The second and third examples show that as CF becomes more established and popular, and put to new uses that sometimes are not "universal" either in global reach or across all [relevant] disciplines, there is a need to somehow establish this context.

One way to to do this is shown in cf-convention/vocabularies#142. That is, to simply attach an identifier at the beginning of the standard name. However, I feel that all this is something to think about, and discuss, in more general terms before it establishes it self as the way to do it. While it seems like an easy solution now, I am not sure it is the best solution in the long run. Anyway, by opening this issue I do not in any way want to stall progress/conclusion of cf-convention/vocabularies#142. If this discussion ends with another solution, it is always possible to deprecate standard names.

sethmcg commented 2 years ago

I think this is a good idea to discuss. A few issues that come to mind:

1) How could we implement it, and particularly, how could we do so in a backwards-compatible way? A very simple approach would be to add an attribute named something like "standard_namespace", but I think that creates a bigger opportunity for names and namespaces to get out of sync. I think something fully self-contained within the standard_name would be preferable.

The "implicit prefix namespace" as in cf-convention/vocabularies#142 has the advantage of being fully transparent to existing software and practices. The other option I see would be to add a delimiter of some kind between namespace and name; something like namespace::standard_name would be familiar and easy to understand for most programmers. The downside of adding a delimiter (of any type) is that it will probably break a lot of software that assumes the standard_name string consists only of characters in [a-z_0-9].

2) For backwards compatibility, there will have to be some kind of default global namespace to assume when no namespace is declared. Consequently, we should probably create some specifications about how users can navigate legacy data that has no declared namespace and standard_names that have since been moved out of the global namespace and into some contextual namespace.

3) Are standard_names allowed to shadow one another in different namespaces? It would surely be convenient for end-users to be able to use shorter, simpler standard_names, and it would also simplify the process of proposing new standard_names if they don't have to be globally unique, but only unique to the namespace. But more guidance would then be needed on how to identify and resolve shadowing.

4) How would we handle the issue that some standard_names will have hierarchical relationships with others in different namespaces? For example, as in the case of cf-convention/vocabularies#25, while some users will want to be able to distinguish between sea water and fresh water in their data, there will be others who have data that does not differentiate the two, and therefore must be able to lump them together. How and where would that relationship be defined and recorded?

I expect that relevant work on these subjects already exists in the information science literature on ontologies. Does anyone know of any references we should be looking at?

JonathanGregory commented 2 years ago

Dear Lars and Seth

I think it's fine to consider this idea (as I believe we have done before). For certain very specific sets of names, especially those which refer to a vocabulary controlled by another authority, I think it's appropriate to identify the namespace somehow, as we currently do informally in some cases with some prefix to standard names, such as in #166. As @larsbarring suggests, it's useful to talk about when this is appropriate and how they should be indicated. In general I don't think we ought to make much use of namespaces within the standard name table because one of their major aims is to be interdisciplinary. I prefer most names to be in the default namespace, as @sethmcg calls it. If we allow namespaces,

I believe that both of these likely consequences would undermine the purpose of standard names. It's definitely more work for us to discuss standard names in the broad community and agree what they mean in a way that everyone can understand, as we have always done, but I think that's precisely why they are valuable and widely used.

Best wishes

Jonathan