Closed Dave-Allured closed 8 months ago
I support this proposal. Given the discussions in #237 and cf-discuss #244, it's clear that there is a need to provide support for localization, but also that there are many hazards to expanding the list of allowable characters. This approach seems the be the best way to provide that support while opening up the fewest cans of worms.
Is there a reason to to extend it to a few more punctuation characters, e.g. the colon :
(Code Point U+003A). I believe that the colon was suggested as a separator in one version of the i18n proposal.
Once names are not longer legal as programming language identifiers, maye there's no reason to restrict them more than necessary maybe a couple more chars?
NOTE: I fully agree that full Unicode letters is a larger proposal that should be handled separately.
I would oppose adding the colon.
Adding .
and -
can cause problems for programmatically creating variables in a program with names that are the same as the names of the variables/attributes in a netcdf file, but strategies for working around it seem pretty clear and straightforward to me.
However, there are also workflows that involve parsing the text output of ncdump, which uses :
as the delimiter between variable/attribute name and value. In my experience, it would be much, much harder to modify those to accommodate :
as an allowed character in a variable name.
I think we want to be very conservative when it comes to adding allowed characters, and not do it until and unless we have a clear and strong demand for it, and have had a long discussion about what could work and what problems it could cause. We've done that with .
and -
, but I don't think that's the case for :
.
text output of ncdump, which uses : as the delimiter between variable/attribute name and value
Good point -- yes, that takes colon off the table.
While I am in favor of the general direction this proposal is going, I also very much agree with what @sethmcg writes. Here I would like to approach this from slightly different angle.
The proposal lists the following benefits:
- Support desired new naming patterns for attributes.
- Enable the use of IETF BCP 47 language tags for attributes.
- Enable the use of other meaningful suffixes for attributes.
As the proposal now stands it will certainly support these benefits but it will also come with some possible drawbacks:
I think that we need a deeper discussion on what we want to achieve by allowing the period and hyphen. In netCDF these are already allowed in attribute names, so the question is how CF wants to use them. For example:
attribute.suffix
? Currently, we have one concrete use case and that is the need for localization, for which the requirements are pretty clear: to attach a language tag construct (language-
country) to attributes. From the conversation in discuss/#244 there is convergence towards using period as marker to indicate start of a language tag. This is a much more specific need than a general extension of the allowed character set.
@larsbarring,
- it is not clear what a "suffix" is, and how to distinguish one from some other possible usage patterns involving period and hyphen.
"Suffix" here is a vague reference to possible future uses for the added characters. It is not part of the new text. It does not need further definition at this time.
I think that we need a deeper discussion on what we want to achieve by allowing the period and hyphen. In netCDF these are already allowed in attribute names, so the question is how CF wants to use them.
- Is the period only going to be designated as marker between an attribute (CF meaning) and its suffix, as in
attribute.suffix
?
This particular PR adds two characters without any directed usage. This will enable any desired future usage, public or private, without violating a CF name rule. Think of it as adding punctuation characters with equal status to the underscore.
- If not, how is the specific use case of IETF BCP47 language tags going to be identified in relation to all other possible attribute name patterns that will emerge?
I have not given it much thought. Simple pattern recognition should take care of most cases. BCP47 is highly recognizable. Hypothetically, the details would be negotiated if there was another "suffix" proposal that had potential conflicts with BCP47.
- As a middle ground between these two, are there other specific uses of the period that may be impacted if it is used for demarcating a suffix?
Possibly. I am not concerned about that at this time.
Currently, we have one concrete use case and that is the need for localization, for which the requirements are pretty clear: to attach a language tag construct (language
-
country) to attributes. From the conversation in discuss/#244 there is convergence towards using period as marker to indicate start of a language tag. This is a much more specific need than a general extension of the allowed character set.
Agreed. My intention is a general extension.
it is not clear what a "suffix" is, and how to distinguish one from some other possible usage patterns involving period and hyphen.
In the*nix world (and pretty much everywhere other than the old DOS 8.3 system), a "suffix" is the part after the last dot in a filename. By convension, but not by fiat, that suffix is a short code indicating the file type. As chaotic as that seems, it actually works pretty well. I think that's the case here too.
Step one: allow "." and "-".
Step two: establish one or more conventions for how they can be used, I would think starting with the idea of a "suffix" similar to the above, and then one or more uses for that, e.g. the language spec proposed.
In short, I think it's OK to leave it kind of loosey-goosey.
Hi all, At @JonathanGregory's request, I've volunteered to moderate this proposal. It looks like we are in agreement that the proposal is acceptable as-is, and it has received a sufficient amount of support.
Unless anyone has any new concerns to raise, this proposal will be accepted in three weeks.
Thanks, @sethmcg.
PR #478 needs a couple of additions:
I don't think that these changes require a clock reset, as long as they're done and agreed before merging. @Dave-Allured - could you possibly add these items?
Many thanks, David
In my earlier comment I had some concerns, but they were/are not strong enough. As no one else have expressed thought in the same direction I am happy to support this proposal (with the same comment as @davidhassell regarding additions). Thanks @Dave-Allured and @sethmcg!
Three weeks have passed with no new concerns raised, so this proposal is now slated to be accepted. @davidhassell is PR #478 ready to go?
Hold until I find time for completions requested by Jonathan.
Will do!
Dear @Dave-Allured and @sethmcg
Maybe I can help this along. One of the two outstanding things is to modify the conformance.adoc
. I think the required change is to insert the following:
ASCII period (".", decimal 46, Unicode U+002E) and ASCII hyphen ("-", decimal 45, Unicode U+002D) are also allowed in attribute names only.
as a second bullet in Sect 2.3 of the conformance document.
The other necessary change is to add a line at the top of the history of the working version in history.adoc
.
Cheers
Jonathan
@davidhassell @sethmcg @JonathanGregory Conformance and history docs are now updated as requested. Anything else?
I have just added a couple of nitpicking comments in the PR.
I think it's ready now, thanks @Dave-Allured.
Would you like to merge it, @sethmcg?
@JonathanGregory Merged!
Note: the "For maintainers" section on the PR boilerplate says "After the merge remember to delete the source branch." The button to do that isn't showing up for me, but the branch also isn't showing up in the list of branches for the repo. Has the repo been set to automatically delete merged branches? If so, that reminder should probably be removed from the PR boilerplate.
Note: the "For maintainers" section on the PR boilerplate says "After the merge remember to delete the source branch." The button to do that isn't showing up for me ...
No. This is another github thing. That reminder should be aimed at the requestor, not maintainers. "Source branch" is referring to the requestor's branch, not necessarily a branch in the CF repo. It is generally the requestor's duty to clean up their own branches.
Perhaps that reminder should be reworded to clarify. The way it is now, is good enough for me.
To strictly conform with CF rules, I am considering Lars' last minute PR requests to restart the wait clock for another three weeks. You can leave this issue closed and merged the way it is now. However, it would be fair to reopen the issue if anyone thinks it is necessary. (Hint: I hope not.) Sorry I did not catch this technicality before merge.
Follow up here or in the PR #478, I do not care which.
@Dave-Allured Dave -- Sorry for the late comment!
No need at all to reopen this issue, or to restart the clock. That would only be if there were some kind of material change to the proposal. And my comments was nothing of that kind. Clearly Seth and Jonathan did not think the comments needed to be considered, which I am fine with -- no problem. Thanks for your contribution Dave!
To whom it may concern, I am reverting this change in new issue #548. This original request was based on my misunderstanding of earlier wording about character restrictions.
Moderator: @sethmcg
Moderator Status Review: New issue, 2023 November 10
Requirement Summary: None.
Technical Proposal Summary: Add ASCII period (.) and ASCII hyphen (-) to the set of characters allowed in netCDF attribute names.
Benefits
Caveats
getattr
,putattr
must be used instead.Status Quo: Attribute names are now restricted to letters, digits, and underscore only.
Associated pull request: #478
Detailed Proposal: Add a new sentence at the end of the first paragraph of 2.3 Naming Conventions as follows. The remainder of 2.3 is left unchanged.
Note: This proposal is a subset of CF #237, and is superseded if #237 is adopted.