cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
85 stars 43 forks source link

Clarify and update CF rules for deprecating content #328

Open ethanrd opened 3 years ago

ethanrd commented 3 years ago

The paragraph in the CF rules document that discusses deprecation is focused on recent (or even the most recent) changes and versions. The rules for deprecation should be updated and clarified for other situations. For instance, a deprecation in issue #314 impacts text that has been part of CF since version 1.0.

JonathanGregory commented 3 years ago

Dear @ethanrd

When we added that part to the rules, this situation had never arisen. I believe this is the first time. As I said in https://github.com/cf-convention/cf-conventions/issues/314, I think that the aim of deprecation should be to discourage faulty data from being written, and it should be the minimal recommendation that would achieve that effect.

Cheers

Jonathan

JonathanGregory commented 3 years ago

Dear Klaus et al.

@zklaus commented as follows in https://github.com/cf-convention/cf-conventions/issues/314:

I was a bit confused by how the term deprecation was used by @davidhassell and @JonathanGregory, so I searched for it in the issue, finding that I myself introduced it here. Allow me to clarify how I understand it.

Deprecation doesn't apply to versions of an artifact, be it a software package or a standards document. Rather, it applies to specific features. What it says is: "We think this feature should not be used going forward. To allow for a transition period, we do not remove it at this point in time, so you can still use it for a bit, but we'd rather you don't, and we want to remove it in a later version." In my mind, it does not retroactively declare past versions wrong, and writing a new file today that declares that it follows the CF conventions version 1.6 is perfectly legal, if ill-advised.

Independent of any deprecation, we might want to have a recommendation to always use the latest version of CF available for new developments.

Maybe we didn't use the word "deprecation" correctly in the rules, where we wrote:

If the change, once implemented in the conventions, subsequently turns out to be materially flawed, meaning that data written following the convention could be somehow erroneous or ambiguous, a github issue should urgently be opened to discuss whether to revoke the change. If this is agreed by a majority of the committee, a new version of the conventions will be prepared immediately, with the second digit of the version number incremented, and will be recommended to be used instead of the flawed version. The flawed version will be deprecated by a statement in the standard document and the conformance document. However, any data written with the flawed version will not be invalidated, although it may be problematic for users.

As Ethan has said, in this case there is no change to be revoked - we hadn't anticipated that we would discover an error that affects all existing versions! Nonetheless, the principle ought to apply. Data written with the flawed versions (all existing ones) is still legal, but might be problematic, so we want to minimise the use of these versions for new data. In my opinion, we are saying retroactively that all these versions are wrong - not everything which is legal is right, after all! To deprecate something means to express disapproval of it. That is what we are doing. We disapprove of all these versions, but only as regards the specific feature we are correcting. Hence my proposal that we deprecate the versions <1.9 in this feature only.

Best wishes

Jonathan

ethanrd commented 3 years ago

Hi all - As I mentioned in issue #314, there are a number of deprecations in the current CF specification. Two involve backwards compatibility with COARDS and have been in CF since version 1.0, one of these involves non-compliance with Udunints and the other with temporary(?) deprecation in the NUG. The rest are more recent changes.

Here’s the list of the deprecations currently found in CF 1.9-draft:

davidhassell commented 3 years ago

Hello,

Is it right that the deprecations that Ethan lists (https://github.com/cf-convention/cf-conventions/issues/328#issuecomment-846140071) are still allowed? i.e. these are not wrong, but are discouraged. This is a different situation to #314, for which the formula terms was wrong and it's old form is, from this time onwards, disallowed (is that the right word?) when writing CF<=1.8.

Perhaps we need an appendix to summarize this sort of information - deprecations and errors - as well as in the relevant parts of the text, for maximum visibility (I have a feeling this has already been suggested - but I can't find where!).

JonathanGregory commented 3 years ago

Dear @ethanrd and @davidhassell

I would say that the conformance document should provide our definitive list of deprecations. A "deprecation" there is a recommendation not to do it; the CF checker gives a warning about any recommendation that can be checked and isn't followed. Any deprecations that are mentioned in the text should be in the conformance document. Maybe not all of those Ethan has detailed are in it, but they should be, I would argue. The first one is there, for example (in section 3.1 of conformance). Not all of them can be checked automatically, or not easily, but they should still be stated anyway, I think.

The deprecation of flawed versions, like in https://github.com/cf-convention/cf-conventions/issues/314, is different. In this case, we have identified an error in the convention, which allows metadata to be written that can't be interpreted reliably. In the other cases, there's nothing actually wrong, and the recommendations are made with the aim of writing metadata which is easier to use in some way.

Best wishes

Jonathan

ethanrd commented 3 years ago

I agree the deprecation in #314 is different than those currently in the specification. Perhaps deprecation is not the right word for this usage. Maybe instead errata or corrigenda?

Whatever words we use, I'm not sure CF should (or how it would) "disallow" the #314 deprecation in earlier versions of CF. Any existing data written using this feature would have been conforming at the time it was written. To now make that data non-conforming does not seem right. On the other hand, a simple warning does not seem enough.

Perhaps CF needs a few categories of deprecations:

(I'm not sure this really helps as what does "very strong error" really mean. Written in all caps and bold text?)

JonathanGregory commented 3 years ago

Dear @ethanrd

I would favour simplicity. At the moment, we have two categories in the conformance document. (1) Recommendations to do something. A recommendation not to do something is a deprecation. The CF checker gives warnings for recommendations (including deprecations) that are not met. (2) Requirements to do something. A requirement not to do something is a prohibition. The CF checker gives errors for requirements (including prohibitions) which are not met. I feel that this is sufficient, provided we make sure all the recommendations and requirements in the standard are included in the conformance document. We can't foresee what will happen in future versions of the convention.

Maybe we should make the https://github.com/cf-convention/cf-conventions/issues/314 deprecation into a prohibition instead i.e. disallow (as you say) the flawed old version of "sigma over z"? Then the checker would give an error if it detects it. We could make clear in stating the requirement that it applies to new data (from now on), and does not invalidate existing data (although such data may unavoidably be problematic).

Best wishes

Jonathan

erget commented 3 years ago

TLDR: My opinion is that the rules are sound for correcting errata, but we do not describe what to do in the case of deprecation. This may not be necessary because we could consider deprecation normal care and feeding for the standard. I do agree that we should have a list of deprecations and the CF Checker should warn of deprecated features. We could consider deciding upon and announcing versions at which we will remove deprecated features.

Musings: I too prefer simplicity, when it's possible. Would it be possible to use the deprecation mechanism differently for 2 classes of items?:

  1. Errata
  2. Discontinued features

My reasoning here is that we would remove these items from the standard with different speeds.

As @JonathanGregory highlights, there is a procedure for errata (abridged by me):

If the change... turns out to be ... flawed, ... a new version of the conventions will be prepared immediately, with the second digit of the version number incremented, and will be recommended to be used instead of the flawed version. The flawed version will be deprecated by a statement in the standard document and the conformance document. However, any data written with the flawed version will not be invalidated, although it may be problematic for users.

This remains reasonable in my mind, and it would be used in the case that quick action is needed.

Then there are deprecations like @ethanrd notes - I'll speak to the use of projection_x_coordinate and projection_y_coordinate in conjunction with the geostationary grid mapping, as I was involved in that one. That is a feature discovered to be erroneous and deprecated by us. However, we've been living with this error for many years now and nobody complained - the deprecation was due to an (over-) abundance of precision. Thus we have not yet set a date at which mention of those attributes will be completely removed from that part of the standard; it is not urgent.

We could adopt the same approach if we ever have a feature that is overly complex and we no longer want to support the use of it - I'm not advocating this, but for the sake of argument let's say it's the use of packed data described in Section 8.1. In that case, we could deprecate the feature, and even inform users with e.g. 2 releases of notice that it will at some point disappear from the standard. In this case it would not be due to an error, but we could treat it the same way.

zklaus commented 3 years ago

I agree with what @ethanrd and @erget said, namely that we have errata and what I would call deprecations.

I think it is quite important to actually remove deprecations at some point, preferably under a predictable policy, e.g. two versions after the initial deprecation. The reason is that these features really become a burden on producers of CF tooling, like libraries for reading CF files. If we essentially allow everything indefinitely, it becomes increasingly difficult to produce a conforming implementation. This really is worse in the case of deprecations and errata than with normal evolution, because deprecation often seems to happen together with a new formulation taking the place of the deprecated feature. That, in turn, implies that as a library maintainer now you need to take care of two different formulations of the same phenomenon, of which one is known to be bad in some sense.

If somebody really needs to rely on an old feature, they are always free to write a file according to the old standard and put the corresponding ":conventions" attribute inside.

JonathanGregory commented 3 years ago

Dear all

I'd like to repeat my earlier points that

Does anyone disagree with those points, I wonder?

Which of these categories should be used if we discover a flaw in the convention, which allows metadata to be produced that can't be interpreted correctly or reliably (as in https://github.com/cf-convention/cf-conventions/issues/314)? The rules say "deprecate" but I think now that's too weak. We should prohibit the use of the flawed convention (not the whole version, just the affected part) for writing new data (but also reassure users that existing data isn't being invalidated).

I appreciate the arguments about the need for a further distinction, and I agree this could help in other cases. I suggest that we need to distinguish between recommendations which are made for good practice (and could remain for ever), and recommendations which are made because there are alternative ways to do something where one is preferred and the other might be abolished in future. That is, we would have three categories in the conformance document, rather than two. An example of a good-practice recommendation in the conformance document is "The name of a multidimensional coordinate variable should not match the name of any of its dimensions." We do not envisage making this a requirement.

In general, we do not try to foresee the future of the CF convention, so I think most of the current recommendations are for good-practice. There should only be a few where we think it's really likely that we are going to abolish something in future. I note that, up to now, we have not abolished anything. One reason for that is because past data continues to exist for a long time. Therefore it's hard to withdraw support for any feature in data-reading programs without causing inconvenience, although you can in data-writing programs. I know that you can always inspect the Conventions attribute, but most user programs don't pay attention to that, I imagine, and we should avoid making things awkward for users.

Hence I would suggest identifying which current recommendations in the conformance document, or which should be in it but aren't, ought to be promoted to a new category of things which really might become requirements/prohibitions in future. What could this new category be called? Warnings?

Best wishes

Jonathan

zklaus commented 3 years ago
* We should make use of the existing list, namely the conformance document, for the purposes being discussed here - I don't think we need a new list.

I agree with this.

* We don't have to distinguish positive and negative categories, because they are logically related: prohibition = negative requirement, deprecation = negative recommendation.

I don't understand this.

Which of these categories should be used if we discover a flaw in the convention, which allows metadata to be produced that can't be interpreted correctly or reliably (as in #314)? The rules say "deprecate" but I think now that's too weak. We should prohibit the use of the flawed convention (not the whole version, just the affected part) for writing new data (but also reassure users that existing data isn't being invalidated).

I appreciate the arguments about the need for a further distinction, and I agree this could help in other cases. I suggest that we need to distinguish between recommendations which are made for good practice (and could remain forever), and recommendations which are made because there are alternative ways to do something where one is preferred and the other might be abolished in the future. That is, we would have three categories in the conformance document, rather than two. An example of a good-practice recommendation in the conformance document is "The name of a multidimensional coordinate variable should not match the name of any of its dimensions." We do not envisage making this a requirement.

In general, we do not try to foresee the future of the CF convention, so I think most of the current recommendations are for good practice. There should only be a few where we think it's really likely that we are going to abolish something in the future. I note that, up to now, we have not abolished anything. One reason for that is because past data continues to exist for a long time. Therefore it's hard to withdraw support for any feature in data-reading programs without causing inconvenience, although you can in data-writing programs. I know that you can always inspect the Conventions attribute, but most user programs don't pay attention to that, I imagine, and we should avoid making things awkward for users.

This may very well be a chicken-and-egg problem. After all, if I have to support every feature since the first version anyway, why check the version number? It seems to me that this approach becomes less and less tractable as the complexity of the conventions increases.

Hence I would suggest identifying which current recommendations in the conformance document, or which should be in it but aren't, ought to be promoted to a new category of things which really might become requirements/prohibitions in the future. What could this new category be called? Warnings?

Continuing this line of thought and drawing further analogy with version numbers in software packages, a flaw that leads to incorrect or uninterpretable metadata could be considered a bug and as such warrant the release of a bug-fix version of the conventions. This could mean releasing version 1.7.1, 1.8.1, and 1.9.1 all at the same time, only changing the relevant bug but otherwise not impacting the feature set of the corresponding version. To avoid undue burden and maintenance of long-obsolete versions one would probably want to declare only a very limited set of versions as supported, say the last two.

This way, old versions can be updated to fix manifest flaws, which makes it also more plausible to actually retire deprecated features because a user that relies on such a feature can find comfort in knowing that the old version he now must use does not fall quickly into disrepair.

User software, which already increasingly is built on standard libraries for interacting with CF data, can check the conventions attribute in a meaningful way and support different versions of the conventions even when the feature sets differ or offer different best practice implementations for the same encoding requirement.

@JonathanGregory is of course correct that all of this goes far beyond #314, but that is how I understood the intent of @ethanrd in opening this issue.

JonathanGregory commented 3 years ago

Dear @zklaus

We don't have to distinguish positive and negative categories, because they are logically related: prohibition = negative requirement, deprecation = negative recommendation.

Sorry to be unclear. I meant by this to explain that the conformance document has only two categories, viz. recommendation and requirement, and that we don't need to have deprecation and prohibition as two more, because a prohibition is a kind of requirement and a deprecation is a kind of recommendation. Does that make sense?

I agree it is possible that we should remove some features when there is an alternative better way, but I think there has to be a strong case for it. Personally, being mostly a data-analyst and data-producer, I think we should favour making it attractive and easy for data-readers and data-writers to use the conventions (correctly, of course, and not carelessly), even if that makes a bit more work for authors of software. That is because writers of widely used software are generally software experts, whereas producers and analysts of data are generally not software experts.

Again I would suggest that a helpful way to proceed would be to identify any current deprecations (recommendations against doing things) in the conformance or conventions document where there could be a strong case for removing some feature in future. Then we will see how large an issue it is, which will help us to decide how to deal with it.

Best wishes

Jonathan

ChrisBarker-NOAA commented 1 month ago

Picking this up again:

from above:

we don't need to have deprecation and prohibition as two more, because a prohibition is a kind of requirement and a deprecation is a kind of recommendation.

Yes, but deprecation is a very specific kind of recommendation -- it recommends you don't use the thing that's deprecated, but it very explicitly states that it might go away, at some time in the future.

I think:

"this isn't a good idea"

is different than

"this isn't a good idea, and it will not work in the future" -- even if "future" isn't currently defined.

But most critically, if we EVER want to be able to remove anything some day (CF 2) -- we will need to be able to clearly define that there's been a deprecation period -- be able to say "you were warned".

And even if we have no idea when that breaking change will actually occur, it would be good to start preparing now -- if there's a feature we'd like to remove, let's deprecate it now.

So yes:

a helpful way to proceed would be to identify any current deprecations (recommendations against doing things) in the conformance or conventions document where there could be a strong case for removing some feature in future.

That's the next step.