Allow period and hyphen in attribute names

Dave-Allured commented 1 year ago

Moderator: @sethmcg

Moderator Status Review: New issue, 2023 November 10

Requirement Summary: None.

Technical Proposal Summary: Add ASCII period (.) and ASCII hyphen (-) to the set of characters allowed in netCDF attribute names.

Benefits

Support desired new naming patterns for attributes.
Enable the use of IETF BCP 47 language tags for attributes.
Enable the use of other meaningful suffixes for attributes.

Caveats

Name rules in most programming languages will prevent the use of program variables or attributes with exactly matching names for such attributes.
Referencing such attributes with dot notation will be prevented in Python and similar languages.
Special quoting or access functions such as getattr, putattr must be used instead.

Status Quo: Attribute names are now restricted to letters, digits, and underscore only.

Associated pull request: #478

Detailed Proposal: Add a new sentence at the end of the first paragraph of 2.3 Naming Conventions as follows. The remainder of 2.3 is left unchanged.

Variable, dimension, attribute and group names should begin with a letter and be composed of letters, digits, and underscores. Note that this is in conformance with the COARDS conventions, but is more restrictive than the netCDF interface which allows use of the hyphen character. The netCDF interface also allows leading underscores in names, but the NUG states that this is reserved for system use. ASCII period (.) and ASCII hyphen (-) are also allowed in attribute names only.

Note: This proposal is a subset of CF #237, and is superseded if #237 is adopted.

sethmcg commented 1 year ago

I support this proposal. Given the discussions in #237 and cf-discuss #244, it's clear that there is a need to provide support for localization, but also that there are many hazards to expanding the list of allowable characters. This approach seems the be the best way to provide that support while opening up the fewest cans of worms.

ChrisBarker-NOAA commented 1 year ago

Is there a reason to to extend it to a few more punctuation characters, e.g. the colon : (Code Point U+003A). I believe that the colon was suggested as a separator in one version of the i18n proposal.

Once names are not longer legal as programming language identifiers, maye there's no reason to restrict them more than necessary maybe a couple more chars?

NOTE: I fully agree that full Unicode letters is a larger proposal that should be handled separately.

sethmcg commented 1 year ago

I would oppose adding the colon.

Adding . and - can cause problems for programmatically creating variables in a program with names that are the same as the names of the variables/attributes in a netcdf file, but strategies for working around it seem pretty clear and straightforward to me.

However, there are also workflows that involve parsing the text output of ncdump, which uses : as the delimiter between variable/attribute name and value. In my experience, it would be much, much harder to modify those to accommodate : as an allowed character in a variable name.

I think we want to be very conservative when it comes to adding allowed characters, and not do it until and unless we have a clear and strong demand for it, and have had a long discussion about what could work and what problems it could cause. We've done that with . and -, but I don't think that's the case for :.

ChrisBarker-NOAA commented 1 year ago

text output of ncdump, which uses : as the delimiter between variable/attribute name and value

Good point -- yes, that takes colon off the table.

larsbarring commented 1 year ago

While I am in favor of the general direction this proposal is going, I also very much agree with what @sethmcg writes. Here I would like to approach this from slightly different angle.

The proposal lists the following benefits:

Support desired new naming patterns for attributes.

Enable the use of IETF BCP 47 language tags for attributes.

Enable the use of other meaningful suffixes for attributes.

As the proposal now stands it will certainly support these benefits but it will also come with some possible drawbacks:

Attribute names can have any combination of the allowed characters, of which some may not be desired (acknowledging that this is a value statement)
it is not clear what a "suffix" is, and how to distinguish one from some other possible usage patterns involving period and hyphen.

I think that we need a deeper discussion on what we want to achieve by allowing the period and hyphen. In netCDF these are already allowed in attribute names, so the question is how CF wants to use them. For example:

Is the period only going to be designated as marker between an attribute (CF meaning) and its suffix, as in attribute.suffix?
If not, how is the specific use case of IETF BCP47 language tags going to be identified in relation to all other possible attribute name patterns that will emerge?
As a middle ground between these two, are there other specific uses of the period that may be impacted if it is used for demarcating a suffix?

Currently, we have one concrete use case and that is the need for localization, for which the requirements are pretty clear: to attach a language tag construct (language-country) to attributes. From the conversation in discuss/#244 there is convergence towards using period as marker to indicate start of a language tag. This is a much more specific need than a general extension of the allowed character set.

Dave-Allured commented 1 year ago

@larsbarring,

it is not clear what a "suffix" is, and how to distinguish one from some other possible usage patterns involving period and hyphen.

"Suffix" here is a vague reference to possible future uses for the added characters. It is not part of the new text. It does not need further definition at this time.

I think that we need a deeper discussion on what we want to achieve by allowing the period and hyphen. In netCDF these are already allowed in attribute names, so the question is how CF wants to use them.

Is the period only going to be designated as marker between an attribute (CF meaning) and its suffix, as in attribute.suffix?

This particular PR adds two characters without any directed usage. This will enable any desired future usage, public or private, without violating a CF name rule. Think of it as adding punctuation characters with equal status to the underscore.

If not, how is the specific use case of IETF BCP47 language tags going to be identified in relation to all other possible attribute name patterns that will emerge?

I have not given it much thought. Simple pattern recognition should take care of most cases. BCP47 is highly recognizable. Hypothetically, the details would be negotiated if there was another "suffix" proposal that had potential conflicts with BCP47.

As a middle ground between these two, are there other specific uses of the period that may be impacted if it is used for demarcating a suffix?

Possibly. I am not concerned about that at this time.

Currently, we have one concrete use case and that is the need for localization, for which the requirements are pretty clear: to attach a language tag construct (language-country) to attributes. From the conversation in discuss/#244 there is convergence towards using period as marker to indicate start of a language tag. This is a much more specific need than a general extension of the allowed character set.

Agreed. My intention is a general extension.

ChrisBarker-NOAA commented 1 year ago

it is not clear what a "suffix" is, and how to distinguish one from some other possible usage patterns involving period and hyphen.

In the*nix world (and pretty much everywhere other than the old DOS 8.3 system), a "suffix" is the part after the last dot in a filename. By convension, but not by fiat, that suffix is a short code indicating the file type. As chaotic as that seems, it actually works pretty well. I think that's the case here too.

Step one: allow "." and "-".

Step two: establish one or more conventions for how they can be used, I would think starting with the idea of a "suffix" similar to the above, and then one or more uses for that, e.g. the language spec proposed.

In short, I think it's OK to leave it kind of loosey-goosey.

sethmcg commented 10 months ago

Hi all, At @JonathanGregory's request, I've volunteered to moderate this proposal. It looks like we are in agreement that the proposal is acceptable as-is, and it has received a sufficient amount of support.

Unless anyone has any new concerns to raise, this proposal will be accepted in three weeks.

davidhassell commented 10 months ago

Thanks, @sethmcg.

PR #478 needs a couple of additions:

the conformance document needs updating
the revision history needs an entry

I don't think that these changes require a clock reset, as long as they're done and agreed before merging. @Dave-Allured - could you possibly add these items?

Many thanks, David

larsbarring commented 9 months ago

In my earlier comment I had some concerns, but they were/are not strong enough. As no one else have expressed thought in the same direction I am happy to support this proposal (with the same comment as @davidhassell regarding additions). Thanks @Dave-Allured and @sethmcg!

sethmcg commented 9 months ago

Three weeks have passed with no new concerns raised, so this proposal is now slated to be accepted. @davidhassell is PR #478 ready to go?

Dave-Allured commented 9 months ago

Hold until I find time for completions requested by Jonathan.

sethmcg commented 9 months ago

Will do!

JonathanGregory commented 8 months ago

Dear @Dave-Allured and @sethmcg

Maybe I can help this along. One of the two outstanding things is to modify the conformance.adoc. I think the required change is to insert the following:

ASCII period (".", decimal 46, Unicode U+002E) and ASCII hyphen ("-", decimal 45, Unicode U+002D) are also allowed in attribute names only.

as a second bullet in Sect 2.3 of the conformance document.

The other necessary change is to add a line at the top of the history of the working version in history.adoc.

Cheers

Jonathan

Dave-Allured commented 8 months ago

@davidhassell @sethmcg @JonathanGregory Conformance and history docs are now updated as requested. Anything else?

larsbarring commented 8 months ago

I have just added a couple of nitpicking comments in the PR.

JonathanGregory commented 8 months ago

I think it's ready now, thanks @Dave-Allured.

Would you like to merge it, @sethmcg?

sethmcg commented 8 months ago

@JonathanGregory Merged!

Note: the "For maintainers" section on the PR boilerplate says "After the merge remember to delete the source branch." The button to do that isn't showing up for me, but the branch also isn't showing up in the list of branches for the repo. Has the repo been set to automatically delete merged branches? If so, that reminder should probably be removed from the PR boilerplate.

Dave-Allured commented 8 months ago

Note: the "For maintainers" section on the PR boilerplate says "After the merge remember to delete the source branch." The button to do that isn't showing up for me ...

No. This is another github thing. That reminder should be aimed at the requestor, not maintainers. "Source branch" is referring to the requestor's branch, not necessarily a branch in the CF repo. It is generally the requestor's duty to clean up their own branches.

Perhaps that reminder should be reworded to clarify. The way it is now, is good enough for me.

Dave-Allured commented 8 months ago

To strictly conform with CF rules, I am considering Lars' last minute PR requests to restart the wait clock for another three weeks. You can leave this issue closed and merged the way it is now. However, it would be fair to reopen the issue if anyone thinks it is necessary. (Hint: I hope not.) Sorry I did not catch this technicality before merge.

Follow up here or in the PR #478, I do not care which.

larsbarring commented 8 months ago

@Dave-Allured Dave -- Sorry for the late comment!

No need at all to reopen this issue, or to restart the clock. That would only be if there were some kind of material change to the proposal. And my comments was nothing of that kind. Clearly Seth and Jonathan did not think the comments needed to be considered, which I am fine with -- no problem. Thanks for your contribution Dave!

Dave-Allured commented 1 month ago

To whom it may concern, I am reverting this change in new issue #548. This original request was based on my misunderstanding of earlier wording about character restrictions.

cf-convention / cf-conventions

Allow period and hyphen in attribute names #477