cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
85 stars 45 forks source link

Allow a standard name `alias` to have more than one `entry_id` #132

Closed mattben closed 5 months ago

mattben commented 6 years ago

[This issue was originally entitled "TRAC cf-convention/discuss#155: Invalid "id" values in CF Standard Name aliasses"]

Running an XML schema check on the CF standard name list, I found the following minor (because they relate to aliasses, not the standard name definitions) issues:

There are spurios spaces in these ids:

EDIT 2024-01-19: Changed the top link to correctly point to the Trac ticket /@larsbarring

mattben commented 6 years ago

Dear Martin

This looks sensible to me. Alison's comment would be useful on this one too. Changing the standard names rectifies a defect, I agree, but I think that changing the schema should be treated as a proposal for substantial change to the convention.

Best wishes

Jonathan

mattben commented 6 years ago

Dear Martin,

Sorry for not getting to this ticket sooner!

I'm not sure I agree with changing the ids with "spurious spaces". The problem is that when the names were first published they did accidentally contain spaces - the aliases were introduced to correct the mistake (in the same way as we would do for a simple spelling mistake). The versions of the names containing spaces had been around for quite a long time before they were noticed. "rateof hydroxyl_radical_destruction_due_to_reaction_with_nmvoc" appeared in versions 28 - 36 of the standard name table, spanning a period of 18 months in 2015-16. The other four appeared in versions 8 - 10 spanning a 7 month period in 2008. It is possible that during those periods data files were written containing the erroneous names. To avoid invalidating such files I thought it was better to use aliases rather than just quietly delete the problem! I could of course simply delete the aliases if that is generally felt to be acceptable, but that would mean treating typos involving spaces differently from any other minor error that might crop up in standard names.

Regarding the other alias that points to two current names, this again was done to avoid possibly invalidating existing data files. The original name, surface_carbon_dioxide_mole_flux, contained no indication of sign convention and this was felt not to be satisfactory. That particular name dates back to pre-version 1 of the standard name table and the aliases weren't introduced until version 15, a period of at least 2006 - 2010. Data files could have been written during that period using either upwards positive or downwards positive as a sign convention and both would have been valid CF at the time. I support the idea of changing the schema to make this use of aliases valid - such a use case was probably not envisaged when the schema was created but the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid.

Best wishes,

Alison

graybeal commented 6 years ago

I agree with "the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid", but I do not agree that the original meaning of the data has been preserved by aliasing it to two identifiers.

Anyone who used the original identifier undoubtedly had one of those two identifiers in mind, but we have not clarified the intended meaning through this process. I'm sorry I missed this topic first time around, and it isn't worth getting up in arms about, but the original term has a clearly different meaning and application than either of its referenced replacements.

JonathanGregory commented 2 years ago

Dear all

Two points were made in this issue at the outset, but I believe that only one remains, so I have changed the title accordingly. The proposal is to change the standard name schema to permit an alias to have more than one entry_id, given that there is one use-case for this. Have their been any subsequent discussions elsewhere about this? Where is the standard name schema CFStandardNameTable.xsd kept?

Jonathan

larsbarring commented 9 months ago

I fully support the change proposed by @mattben when opening this issue. And if I understand Matthew's comment as relaying a response, also @japamment supports this (cf. last few line in that comment.)

JonathanGregory commented 9 months ago

I support this change as well.

davidhassell commented 9 months ago

I also support the change that permits an alias to have more than one entry_id.

Are there any implications for known software that uses the schema (such as the standard names editor, I presume) easy to deal with?

larsbarring commented 9 months ago

The change required to the xml schema (xsd file) is really small:

    <xs:element name="alias">
        <xs:annotation>
            <xs:documentation>The alias element contains one or more entry_id element 
                               with the id of the entry containing the definition. It is intended as 
                               a mechanism for modifying standard names in a backward compatible 
                               fashion. Typically, there is one entry_id, but in a few instances 
                               there are two entry_id, for example if a standard name is divided 
                               into upwards and downwards alternatives.</xs:documentation>
        </xs:annotation>
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="entry_id" maxOccurs="unbounded"/>
            </xs:sequence>
            <xs:attribute name="id" type="xs:ID" use="required"/>
        </xs:complexType>
    </xs:element>

(added annotation linebreaks). The only change needed is addition of the maxOccurs="unbounded" attribute, and I have amended the annotation text.

However changes are needed to other parts of the processing chain, see this comment.

japamment commented 7 months ago

Hi @larsbarring, I support this change as there is a clear use case for allowing one alias to map to two standard names, as demonstrated in the original proposal.

If I have understood correctly, cf-convention/cf-conventions/issues/509 and the associated pull request, cf-convention/cf-conventions/pull/510 will update Appendix B to be consistent with this issue. I support those too.

Regarding the xml file, a modification to the CEDA standard names editor will be needed to allow it to output pairs of entry_id tags associated with a single alias_id. As a temporary measure until the editor is updated, we can apply a post-processing script to the xml file to achieve the same result. I will prepare a suitable script ahead of the next standard name table update, so once cf-convention/cf-conventions/issues/509 is closed I think this issue can also be closed.

larsbarring commented 5 months ago

The issue with double aliases have been resolved in https://github.com/cf-convention/cf-conventions/issues/509. Standard names with a spurious space have been discussed in https://github.com/orgs/cf-convention/discussions/310 with unanimous outcome, and will be resolved in https://github.com/cf-convention/vocabularies/issues/7. Hence I am closing this as "change agreed" (even though the changes are actually implemented elsewhere).