cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
85 stars 43 forks source link

Interpretation of negative years in the units attribute #298

Closed peterkuma closed 3 years ago

peterkuma commented 4 years ago

I have encountered an issue with using a negative year in the units attribute of time variables in NetCDF files. The interpretation in cftime (Python) is that year zero does not exist, while in other software such as Panoply the interpretation is that year zero exists. This affects how the time variable is read and displayed, and effectively causes one year difference between the different implementations. The CF Conventions (Section 4.4. Time Coordinate) do not explicitly state how negative years should be treated, except for stating that year 0 has a special meaning. On the contrary, ISO 8601, seems to be more on the side of including year 0.

In particular, this issue comes up when using Julian date in NetCDF files, which has a reference time of 1 January 4713 BCE, 12:00 UTC. As of now it is impossible to use it in NetCDF files and get consistent results in Python (through the netCDF4 package) and Panoply.

I suppose there are multiple possible solutions to the problem. Either all implementations start using the same method of counting negative years (and it would be helpful if the CF Conventions make this unambiguous), or there would have to be information about the year numbering convention included in the NetCDF file, such as a new attribute or an indicator included in the units or calendar attributes.

Related issue in cftime: #200.

Dave-Allured commented 4 years ago

@peterkuma, CF has never standardized the meaning of year zero or negative years. As you have discovered, different software developers have implemented various and conflicting interpretations.

Best practice is to encode all timekeeping so that zero and negative are never encountered in either recorded times or in the reference time in the units attribute. Furthermore, when recording modern real world times, use only years later than 1582 in both positions, to avoid the Julian/Gregorian crossover. Please see CF section Time Coordinate for more details and caveats.

For most applications, I recommend a reference date of January 1, hour zero, of the first year of your data domain, or some other round year number close to but earlier than the start.

It seems like you have an application that is well aware of correct real world dates and times when the data are originally being recorded. Is there something that would prevent this application from encoding times in this unambiguous way, relative to a modern reference time?

peterkuma commented 4 years ago

@Dave-Allured, thank you for your response. As you say the issue can be avoided by using a reference time after 1582, which is fine for end users who can choose the reference time. However, I think the situation for the developers of generic software which uses NetCDF is still unresolved, because they don't have a concrete guidance on how to use these reference times. My proposal would be to add text to the CF Conventions saying how negative years should be interpreted, even if it is something as short as:

"When the year of the reference time is negative, year 0 is not counted in calculations involving this reference time."

(or a similar statement), appended to the second paragraph of Section 4.4. Time Coordinate.

To answer your question, I write relatively generic software for use with climate observations and climate modeling, which doesn't know what time range is going to be used by the user. I prefer to use Julian date everywhere in my code because it makes it easy to perform any calculations when all time variables have the same reference time. It would be great if NetCDF had a good support for this use case.

Dave-Allured commented 4 years ago

@peterkuma, I share your interest in standardizing the treatment of zero and negative years. However, I am afraid your use case may not appropriate for this task. My experience so far is that almost all climate-related obs and model data sets that might use CF encoding are in the domain of only positive year numbers. I speculate that most climate model makers deliberately avoided zero and negative years because of this uncertainty. I might be wrong, I have not checked lately.

You are writing generic software for climate obs and modeling. If you plan to use a fixed reference time for internal software purposes, then I suggest 1 January +0001 00:00, rather than the astronomical calendar base that you mentioned, to reduce problems. Take appropriate care with the Julian/Gregorian discontinuity.

If that is not satisfactory, then I am glad to keep discussing a CF amendment.

martinjuckes commented 4 years ago

@Dave-Allured : I think this does deserve some further discussion because the study of climate in the distant past is a very important part of climate science, even if it is small in scale compared to the study of present day climate. Our problem is that the distant past is more important for our community than it is for many of the people who work on standards for dates and times.

The NetCDF4 library may have resolved this for us, unless we make an active decision to depart from the treatment of negative reference times in NetCDF4. For example, if I create a file from the following CDL using ncgen (from version 4.6 of the NetCDF library):

netcdf time_ex01 {
dimensions:
    time = 3 ;
variables:
    double time(time) ;
        time:standard_name = "time" ;
        time:units = "days since -0001-01-01" ;
        time:calendar = "standard" ;

// global attributes:
    :Conventions = "CF-1.7" ;
data:
time = 0, 365, 731 ;
}

and then run ncdump -t, the resulting time values are given as: time = "-0001-01-01", "0000-01-01", "0001-01-01" ;. i.e. NetCDF4 interprets 0001-01-01 as being two years after -0001-01-01. Unfortunately, this is not well documented in the NetCDF User Guide, but it is unambiguous. I think this takes precedence over the cftime library, because we are trying to build on top of the NetCDF data model ... but there may be other interpretations of that.

JonathanGregory commented 4 years ago

This has been discussed in CF before but without conclusion. It would certainly be useful to adopt a convention for it, because there are use-cases, as @peterkuma demonstrates. It's not a problem with the reference year itself, but with the definition of the calendar. Given the lengthy debates about what "calendar" means when we were discussing leap-seconds, I use that word with some nervousness! I mean by "calendar" the set of valid dates (DD-MM-YYYY), which is implied by the choice of the calendar attribute in CF.

In the standard calendar, as we all know, there is no year between 1 AD (CE) and 1 BC (BCE). I suppose it's because year 0 doesn't exist that COARDS chose year 0 to indicate climatological time. (CF supports that convention for compatibility with COARDS, which only deals with the real-world standard calendar.) I'm interested to see what @martinjuckes reports about NetCDF-4. If you accept 0 as a valid year number, it means you have to write 2 BC as year -1, 3 BC as year -2, etc. That seems rather confusing to me, and likely to lead to mistakes. However, it seems that this is what is done for the proleptic Julian calendar, which is used in astronomy. Wikipedia says, "year 1 of the Julian Period was 4713 BC (−4712)." It seems that there is a year 0 in that calendar. Is that correct? For model calendars, I guess that year zero probably does exist, because it's an inconvenience to arithmetic if you leave it out!

If we decide there isn't a well-defined best answer, and there are divergent use-cases, we could define different CF calendars with and without year zero.

dopplershift commented 4 years ago

I'm pretty sure that netCDF-C's support of dates is really just for minimal convenience and not intended to be any kind of standard. I'll poke @WardF and @lesserwhirls to chime in here...

JonathanGregory commented 4 years ago

It might be reasonable to define the standard calendar as not permitting years less than 1. Whether or not year 0 exists is a choice for the proleptic calendars, I think.

martinjuckes commented 4 years ago

I initially liked Jonathan's idea of introducing a new proleptic calendar(s) to make it explicit when people care about the interpretation of negative years in the reference time, but, after thinking over the point discussed below, I would prefer to use a qualifier, e.g. proleptic_gregorian cardinal

If you take an etymological approach, I think AD 1 refers, in effect, to "year 1": i.e. it is a reference of a time period, not a point in time. Similarly, 1 BC is "year 1 before". From that perspective, it is natural that there is no AD 0 or 0 BC. UTC adopts the convention that 2020-01-01 00:00:00 refers to the start of AD 2020, but says nothing about negative times.

The question, I think, is not "Is there a year zero in the calendar?" but rather "How do we encode a given calendar year in the time stamp?".

Given that positive YYYY in the datestamp is universally understood to refer to AD YYYY, arguments can be made for interpreting -YYYY as either and extension backwards treating the years as a sequence of integers, or as 1 BC. This is an encoding choice. For this reason, I would not modify the calendar, but instead add an optional qualifier to identify the convention for encoding BC years in the timestamp.

I'm not sure whether there is a good mnemonic term, but I suggest proleptic_gregorian cardinal to refer to use of cardinals, -1, 0, 1 rather than ordinals, 2nd before, 1st before, 1st after.

lesserwhirls commented 4 years ago

netCDF-C support for dates/times/calendars is limited to the ncdump utility, and I believe only as a convenience (nctime0.h isn't installed as part of the library). It has been extended over time to be more flexible with respect to both UDUNITS as well as to better operate with different calendars. I would strongly recommend that its behavior with respect to time be viewed as any other netCDF based client lost in the fog of the various real and imaginary calendars, and not be viewed as a sort of reference implementation (or even guidance) regarding date/time/calendar handling for netCDF files generically.

peterkuma commented 4 years ago

I really appreciate that others have joined the discussion. From my understanding the ISO 8601 standard should be considered as a mapping to the underlying calendar, i.e. year 0 and -1 in ISO 8601 are mapped to year 1 BCE and 2 BCE, respectively. In that sense, there is no conflict between ISO 8601 and the calendar, even though it is slightly confusing. The ISO 8601 document itself is probably a better source of information than Wikipedia. In section B.1 (Date and time representations) of the ISO 8601:2004 version it has an example:

Basic format Extended format Explanation
−00020412 −0002-04-12 Expanded; four digits to represent the year. The twelfth of April in the second year before the year [0000]

Other than that it is relatively short of good explanation about how negative years should be treated. Two other relevant fragments in the document are:

The newer version of the ISO standard from 2019 defines extensions in ISO 8601-2:2019:

Therefore, it looks like both ways of counting/not counting year 0 are supported by the standard, and they are distinguished by adding "YB" to the year number without leading zeros in the "explicit form".

I think it would be desirable to follow ISO 8601 (but I am not sure about the "YB" form because it may be too complicated to parse all variations of the format), unless there are historical reasons not to, such as how the handling of negative years has been implemented in udunits. I will try to do a short survey of how existing NetCDF libraries implement zero and negative years.

I have been in contact with the developer of Panoply (Robert B. Schmunk, @rschmunk - I hope that is his GitHub username), who said that handling of dates in Panoply follows standard Java classes, which is according to the ISO 8601 standard of including year 0 in the calculations.

@Dave-Allured, I wouldn't be trying to only solve my use case here. I know I could work around it easily by choosing a reference time with a positive year. I am quite interested in solving the issue for all users of NetCDF, as much as I can contribute to the discussion. It looks like there is enough interest from others too.

martinjuckes commented 4 years ago

@peterkuma thanks for the detail, I was looking at ISO 8601-2014 (accessed in 2018) ... and the treatment of years -9999 to 0 has clearly be considerably enhanced in the 2019 version, especially in the extension (8601-2). I've just downloaded a new version.

The NetCDF Java library has quite extensive calendar support .. but the only thing I could find in the NUG that our convention references was a link to the CDL functionality which I've illustrated above. I agree completely with @lesserwhirls that the NetCDF libraries, and libraries in general, should not be treated as a standard or convention, but we have to consider the consequences of recommending anything that conflicts with a widely used library.

Using the new explicit ISO 8601 form for dates may help to make the distinction between the two mappings clearer, with 1YB equivalent to 1 BC and 0000-01-01 as 1 year before 0001-01-01. A complete date and time looks like this: 1985Y4M12DT23H20M30S, which would require some extra parsing code. There is, however, a lot of flexibility in the ISO standard, as you might expect. I have doubts about an approach which would require users to deal with the full range of options.

JonathanGregory commented 4 years ago

Thank you, @peterkuma, for the useful information, and to @martinjuckes for his correction to the question - not "Is there a year zero in the calendar?" but rather "How do we encode a given calendar year in the time stamp?". I think these are linked in the CF convention. To be precise, I think we could say that the CF calendar attribute indicates both the set of valid dates in the calendar (YYYY-MM-DD) and the specification for mapping a valid date-time to a unique number (i.e. the encoding of date-time as a time coordinate). Though I appreciate Martin's point that these are two concepts, which is why he prefers a qualifier, since you must have both and there are few possible combinations it feels more robust to me to keep an indivisible attribute for them.

Therefore I suggest that for the default/gregorian/standard calendar, we recommend that the reference date should not be earlier than year 1, and that no date earlier than year 1 should be encoded in this calendar, to avoid the ambiguity. That means the CF-checker would produce a warning (but it's not an error). We shouldn't retrospectively define what negative years mean because we don't know the intention of existing data. I suggest that we define two new calendars:

Any other calendar where the same ambiguity exists could have the same treatment, if there are use-cases which need them. This would be for Julian and proleptic Gregorian calendars, since the other two are model calendars in which I think we can assume year 0 is valid.

Dave-Allured commented 4 years ago

Well I see that my feeble attempts to avoid the year zero issue have failed. I think that the focus in this conversation is good, and I hope we can reach a clear resolution for all six CF defined calendars. Thanks everyone so far, for your research and attention to detail.

I support the ISO 8601 "expanded representation" approach for the interpretation of year numbering. ISO 8601 deals not specifically with calendar definitions, but rather with how to construct string representations. This is relevant for the reference time in the CF units string. As reported by @peterkuma, this representation puts year numbers on a mathematically normal integer time axis that includes negative years and year zero.

@martinjuckes, I would prefer to stay with the familiar year-month-day string syntax, and avoid the new explicit ISO 8601 form that adds new designators such as "Y" and "YB". I think it will be sufficient to simply add explicit documentation for the proper CF treatment of negative and zero year numbers in the units string.

I favor adding constraints for two of the CF calendars. This is a different way of avoiding the year zero issue, specifically an attempt to keep most common applications "safe" from crossover problems. The calendar named "Gregorian" should be restricted to only dates from 1582 October 15 forward. This would apply to both the reference date and all encoded dates. This should be fully compatible with existing data sets that have paid any attention to stated best practices for many years now. Likewise, the calendar named "Julian" should be restricted to only 0001 January 1 forward.

As a result, the need to clarify negative and zero years would be reduced to only the remaining four CF calendars: 360_day, 365_day, 366_day, and proleptic_gregorian.

JonathanGregory commented 4 years ago

Dear @Dave-Allured et al.

We must have had a similar discussion some time ago - it feels familiar! While I appreciate wanting to avoid the Julian-Gregorian transition, I don't think we should disallow the default/standard calendar before 1582. This calendar has always been clearly defined as the mixed Julian/Gregorian calendar; it's the real-world calendar, and we can't exclude a need for real-world time axes which cross the transition.

The Gregorian calendar is undefined before 1582. Possibly we could redefine gregorian in the way Dave suggests (not allowing encoded dates or reference dates before 1582). That would give it a different meaning from default/standard in future data. This change could be a pitfall for interpretating any existing data which says calendar="gregorian" for time coordinates before 1582, but I think gregorian is less likely to have been used than standard or default since it is truly not Gregorian for such times!

However, in view of this point of Dave's, I'd like to change my proposal for new calendar names to:

For years>0, both of these calendars are the same as the default/standard calendar. In that calendar, years<1 should be deprecated.

For julian and proleptic_gregorian, years before 1 should be deprecated, and we could define _withzero and _nozero variants correspondingly if they are needed.

For noleap=365_day, all_leap=366_day and 360_day, I think we could assume that year zero and negative years are allowed. The current definitions describe them as "Gregorian" calendars, which isn't really a useful statement! I would redefine them as calendars in which months have the same lengths in every year. In noleap, the month lengths are as for a non-leap year of the Gregorian calendar, in all_leap, they are as for a leap year, and in 360_day all months have 30 days.

Best wishes

Jonathan

martinjuckes commented 4 years ago

Dear @Dave-Allured , @JonathanGregory ,

I agree with Jonathan's point that someone may want to encode real world data from the 16th century (e.g. weather records from 16th century diaries) .. and so we should maintain the existing support for using actual the mixed Julian/Gregorian calendar.

For 365_day I agree that the current definition is confusing -- how about "All years are 365 days with months as in a non-leap year of the Gregorian calendar", and a similar statement for 366_day.

I agree with Dave's recommendation to avoid the new ISO 8601-2 explicit form for dates (YB etc). I think we should spell out what we are prepared to accept. E.g. Would we accept the basic form which has no separators within date and time (e.g. 19850412T101530)? If the basic form is allowed, it is necessary, to avoid confusion, that a fixed number of digits be used for the years (4 by default, but it can be expanded as long as the length is agreed between parties exchanging data). If we stick to what they call the extended form (e.g. 1985-04-12T10:15:30) the number of digits in the year can be varied without ambiguity.

There is currently a special meaning attached to reference dates of the form 0-1-1, for backward compatibility with COARDS, in section 7.4. Can we remove this feature from CF-1.9?

Dave-Allured commented 4 years ago

There is currently a special meaning attached to reference dates of the form 0-1-1, for backward compatibility with COARDS, in section 7.4. Can we remove this feature from CF-1.9?

That troublesome usage of 0-1-1 is already deprecated by the wording of 7.4 in all CF versions. That section also limits the special meaning to only the "real-world calendar". So if we can get consensus that dates earlier than year 1 are not valid in the existing standard and julian calendars under CF, I think it will be safe to leave the special meaning as deprecated, rather than removing it.

Section 7.4 also includes the only explicit treatment of the year zero concept in the entire document. Year 0 may be a valid year in non-real-world calendars ... This supports the possibility of explicit treatment of zero and negative years in alternate calendars.

JonathanGregory commented 4 years ago

Dear @martinjuckes and @Dave-Allured

For 365_day I agree that the current definition is confusing -- how about "All years are 365 days with months as in a non-leap year of the Gregorian calendar", and a similar statement for 366_day.

Yes. I agree.

I agree with Dave's recommendation to avoid the new ISO 8601-2 explicit form for dates (YB etc).

I do too.

I think we should spell out what we are prepared to accept.

Yes. Since CF generally supports udunits formats for units, we ought at least to allow what udunits does for time, but I haven't found out in the units documentation what formats it accepts. I think it will allow Y[-M[-D [h[:m[:s]]]]], where Y can be a large positive or negative number or zero. (NB udunits itself only handles the real-world calendar, but we use its format for the others.)

I agree with @Dave-Allured that it's OK to continue to allow but deprecate the special use of year 0 in the real-world calendar.

if we can get consensus that dates earlier than year 1 are not valid in the existing standard (or default) and julian calendars under CF ...

I think that years<1 should be deprecated in these calendars, but not disallowed, because of backward compatibility. What do @martinjuckes and others think?

Above (https://github.com/cf-convention/cf-conventions/issues/298#issuecomment-698987558) I have made proposals for new calendars, going before year 1, and asked whether gregorian should be redefined.

Jonathan

martinjuckes commented 4 years ago

agree with @JonathanGregory on deprecating (rather than disallowing) years < 1 in standard and julian calendars (where I interpret this to refer to the year in the reference time stamp).

I'm not sure about the proposal to redefine gregorian : it is currently defined as mixed Gregorian/Julian which appears OK. I don't have a clear opinion on this.

Concerning what udunits2 supports: the command line tool treats 0-0-0 as equivalent to 1-1-1 and -1-1-1 as being 366 days apart. udunits2 uses a mixed Gregorian/Julian calendar.

The library does accept arbitrary years and ISO basic format. This means that 19850101 is equivalent to 1985 or 1985-01-01. This means that the -MM is not optional when you want to reference years with more than 4 digits. It looks like a rather fragile approach to me, and documentation is lacking. Does anyone know of people wanting to use the ISO basic format, with no delimiters in the date? Could we simplify the specification (and parsing requirements) by insisting on the ISO "extended format", which has - as a delimiter in the date and : in the time?

JonathanGregory commented 4 years ago

Dear @martinjuckes

agree with @JonathanGregory on deprecating (rather than disallowing) years < 1 in standard and julian calendars (where I interpret this to refer to the year in the reference time stamp).

Yes, it would apply to the year in the reference timestamp, and I think it would also mean deprecating any attempt to decode or encode a time before year 1. The CF checker would be able to detect such years in the time coordinate, and should give a warning about it, because their meaning would be unreliable.

I'm not sure about the proposal to redefine gregorian : it is currently defined as mixed Gregorian/Julian which appears OK. I don't have a clear opinion on this.

My suggestion would be to make gregorian different from standard and default, by deprecating times before the change of calendar in 1582 for gregorian (rather than year 1 for the others). If you say it's Gregorian (rather than mixed or proleptic_gregorian), it really should not exist before that calendar was introduced!

Could we simplify the specification (and parsing requirements) by insisting on the ISO "extended format", which has - as a delimiter in the date and : in the time?

Alternatively we could define what it means if you supply a date consisting of more than eight digits and no delimiters. But that would imply a requirement on software to support our interpretation. Maybe we could deprecate it instead of disallowing it.

Jonathan

larsbarring commented 4 years ago

My suggestion would be to make gregorian different from standard and default, by deprecating times before the change of calendar in 1582 for gregorian (rather than year 1 for the others). If you say it's Gregorian (rather than mixed or proleptic_gregorian), it really should not exist before that calendar was introduced!

:+1: for this, and in particular for the last sentence.

Dave-Allured commented 4 years ago

I think it would help to confine this discussion to the requested issue, which is zero and negative years in currently defined calendars. A refinement of the "Gregorian" label is a good topic, but I should not have injected it into this discussion.

Also a full discussion of alternate date formats in the reference string is complicated. Can we please defer that to a future issue, when needed?

I wholeheartedly support new calendar names that are explicit and mathematically well-defined. It is relevant to mention those ideas here. However, can we also put off their resolution to new issues, as needed?

Let's see if we have some consensus so far on the following, acknowledging some previous agreement above.

Proleptic_gregorian is problematical. Let's talk about that more later.

Agreed so far?

JonathanGregory commented 4 years ago

Dear @Dave-Allured

Thanks for the summary. Yes, I agree with all those bullet points. I would like to add

Jonathan

Dave-Allured commented 4 years ago

@JonathanGregory, I agree with your addition. That was my intention, I just did not fold that into the wording correctly.

larsbarring commented 3 years ago

I have not much to add to this discussion, my earlier comment was only to express my support the suggestion to make the calendar names/terms clearer and more self-evident.

For what it is worth, @Dave-Allured's summary and @JonathanGregory's addition looks good to me.

Dave-Allured commented 3 years ago

I am coming around to favoring a partial ISO 8601-2:2019 approach as described above by @peterkuma. Both ways of either counting or not counting year 0 could be supported with some minimal extension of the reference date notation, as initially suggested above by @martinjuckes. I have a suggested notation that I would like hold for later.

Let's continue to focus on the primary question of year numbers in the traditional CF format, without any new notation. By ISO 8601-2:2019, and if we agree, year zero and negative years are included.

Now the interpretation for proleptic_gregorian is still undecided. I suggest that the current, unadorned proleptic_gregorian should include year zero and negative years for general scientific usage. I do not know of any data sets that have encoded zero or negative years in a conflicting way with proleptic_gregorian. Also there is precedent for this outside of CF; see the Wikipedia article.

@JonathanGregory, you proposed that years before 1 should be deprecated for proleptic_gregorian. Do you have a specific reason for preferring this?

martinjuckes commented 3 years ago

Hello @Dave-Allured

I also agree on the suggested approach to supporting zero and negative reference years with explicit specifications, and deprecating them in some cases.

Also tend to favour allowing negative years in the proleptic_gregorian, as it appears to be designed for continuity going backwards in time.

JonathanGregory commented 3 years ago

you proposed that years before 1 should be deprecated for proleptic_gregorian. Do you have a specific reason for preferring this?

Only the supposition that it might be not well-defined what year zero means. If the consensus is that year 0 is a normal year in the proleptic Gregorian calendar, I think that's good. We can allow zero and negative years for this calendar.

The withzero and nozero options are most relevant for the Julian and standard calendar, I suppose.

marqh commented 3 years ago

I believe that ISO8601 is as good a definition as we have for the datetime stamp, the Gregorian calendar, and the Proleptic Gregorian calendar.

ISO8601 is explicit in the inclusion of year 0000 and its interpretation.

years prior to 1583 are not automatically allowed by the standard. Instead "values in the range [0000] through [1582] shall only be used by mutual agreement of the partners in information interchange."

CF is a good example of mutual agreement between partners.

An expanded year representation [±YYYYY] is available, again by mutual agreement. and it must be prefixed with a + or − sign. By convention 1 BC is labelled +0000, 2 BC is labeled −0001.

Dave-Allured commented 3 years ago

@marqh, I am proposing only one idea from ISO8601, the mapping of zero and negative year numbers as you just showed. Year 0 = 1 BC, etc. I am not proposing a full adoption of an ISO8601 format. ISO8601 uses fixed length numbers, whereas the CF date/time stamp allows variable length numbers with delimiters. Also, CF does not use the plus sign.

The delimited system is robust and has served us well for a long time. The CF delimited system accommodates ISO8601 fixed width formats when the standard delimiters including the "T" separator are used. E.g., YYYY-MM-DDTHH:MM:SS is correct under both systems.

semmerson commented 3 years ago

The UDUNITS-2 library could be modified to interpret the timestamp in a time "unit" using the proleptic Gregorian calendar rather than the currently-used hybrid Julian/Gregorian calendar.

The question is whether or not this would be a good idea.

Regards, Steve Emmerson UDUNITS Developer

On Tue, Oct 27, 2020 at 2:15 PM Dave Allured notifications@github.com wrote:

@marqh https://github.com/marqh, I am proposing only one idea from ISO8601, the mapping of zero and negative year numbers as you just showed. Year 0 = 1 BC, etc. I am not proposing a full adoption of an ISO8601 format. ISO8601 uses fixed length numbers, whereas the CF date/time stamp allows variable length numbers with delimiters. Also, CF does not use the plus sign.

The delimited system is robust and has served us well for a long time. The CF delimited system accommodates ISO8601 fixed width formats when the standard delimiters including the "T" separator are used. E.g., YYYY-MM-DDTHH:MM:SS is correct under both systems.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cf-convention/cf-conventions/issues/298#issuecomment-717512280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEVZ7P4VERDGTWK4QUSNDLSM4S5LANCNFSM4ROMDBIQ .

JonathanGregory commented 3 years ago

Dear @Dave-Allured

I wonder whether you have time to restate your proposal? It looks like it was very nearly concluded in October. I don't see any outstanding disagreements. It would be good if we could push it over the finishing-line.

Jonathan

Dave-Allured commented 3 years ago

Sorry for the delay. Here is a new and simplified proposal for zero and negative year numbers in time coordinates. I think this represents a consensus of the current discussion. I avoided several side issues that were discussed, but are not directly relevant.

Dave-Allured commented 3 years ago

See pull request #315 for the actual changes. I also added this minor wording change to clarify the exact meaning in some places.

Dave-Allured commented 3 years ago

The UDUNITS-2 library could be modified to interpret the timestamp in a time "unit" using the proleptic Gregorian calendar rather than the currently-used hybrid Julian/Gregorian calendar.

The question is whether or not this would be a good idea.

@semmerson, the default behavior of UDUNITS 1 and 2 should stay the way it is. I am afraid that use of reference dates earlier than the crossover, contrary to two decades of warnings, has been widespread. UDUNITS is used for time conversion in large amounts of user code and software tools.

jswhit commented 3 years ago

The cftime python module currently does not allow year zero for Julian, mixed or proleptic_gregorian calendars, but does allow negative years (back to year Jan 1 year -4713 for Julian and mixed, and Nov 24 year -4714 for proleptic_gregorian to allow days since -4713-1-1-12 to be used for computing the Julian day). If this proposal is adopted, this would need to change.

What is the rationale for allowing year zero in proleptic_gregorian, but not in the Julian and mixed calendars? It seems more consistent to me to disallow year zero for all 'real-world' calendars.

JonathanGregory commented 3 years ago

Dear @Dave-Allured

Thanks for repeating the proposal and preparing the pull request. I agree with the choices you propose and the wording. After the deprecation of year zero in reference date/time, I'd like to add the following for the avoidance of doubt. Alternatively it could be inserted after "prohibited for certain calendars, as noted below."

Date/times in zero or negative years are prohibited by calendars which prohibit these years in the reference date/time. In these calendars, it is an error to store or decode a time coordinate value for a date/time earlier than 1-1-1 0:0:0, regardless of the reference date/time in the time units.

I also agree with your deprecation of the year 0 climatological convention, but it is stronger than what section 7.4 currently says, which is, "We do not recommend this convention." That sounds the same, but it's not currently noted in the conformance document for section 7.4. Therefore I think we must either not deprecate it in 4.4, or deprecate it in 7.4, to be consistent.

We also need to include corresponding changes to the conformance document in the pull request. I think it's in the same repo now, isn't it, for this reason?

Re the question from @jswhit, I too asked about year zero in the proleptic Gregorian calendar earlier. You said that year zero is conventionally allowed in that calendar. I'm happy to take your word for that.

Best wishes

Jonathan

JonathanGregory commented 3 years ago

Another small suggestion: I think date-time with a hyphen would be better than date/time with a slash. I understand "/" to indicate alternatives, whereas "-" joins words. I am going to change the label of this issue to enhancement, which is what it has become. Jonathan

jswhit commented 3 years ago

Regarding the use of year zero in the proleptic_gregorian calendar, I understand now why @Dave-Allured proposed allowing this. One of the few places proleptic_gregorian is used is in ISO-8601 which specifies that year 0 is 1 BC.

Dave-Allured commented 3 years ago

@jswhit, sorry for the delay in responding.

What is the rationale for allowing year zero in proleptic_gregorian, but not in the Julian and mixed calendars? It seems more consistent to me to disallow year zero for all 'real-world' calendars.

That is a fair question with a complicated answer. Here are the main arguments from my viewpoint.

JonathanGregory commented 3 years ago

I agree with @Dave-Allured's arguments about proleptic Gregorian versus standard and Julian calendars. If there is a use-case for the Julian calendar before year 1, which is disallowed by the present proposal, we can introduce another calendar for it in a later proposal. Jonathan

Dave-Allured commented 3 years ago

I need to further address backward incompatibility for the current proposal to include year zero in the unadorned label proleptic_gregorian. It has already been said that there are conflicting interpretations between several software packages. @jswhit mentioned this above regarding the cftime python module. There is further concern in cftime issue #233. Cftime is not the only library that uses no-year-zero encoding.

My opinion is that this particular problem is limited and manageable, we should proceed to define proleptic_gregorian to include year zero for CF purposes, and this will result in a simpler future. Here are some supporting arguments.

Dave-Allured commented 3 years ago

@JonathanGregory said:

Re the question from @jswhit, I too asked about year zero in the proleptic Gregorian calendar earlier. You said that year zero is conventionally allowed in that calendar. I'm happy to take your word for that.

Careful, I don't think I actually said that. All that I said was "there is precedent for [including year zero] outside of CF; see the Wikipedia article." If there was a universal governing standard for Proleptic Gregorian, this conversation would be easier. I have found no such standard. ISO-8601 certainly does not count; they are a latecomer to the Proleptic Gregorian party. Unidata/netCDF has declined authority.

My sense is that there is a slow convergence within scientific and technical communities, on including year zero in the calendar called Proleptic Gregorian, for record keeping purposes. Astronomical year numbering is the most long standing example. My opinion is that CF should join this trend.

JonathanGregory commented 3 years ago

Dear @Dave-Allured

Thanks for your arguments. I agree we need to make a clear choice, and I'm happy with our following or setting a trend to define proleptic_gregorian (with no qualification) as including year zero.

At https://github.com/cf-convention/cf-conventions/issues/298#issuecomment-794196013 I made some other comments on your proposed text.

Best wishes

Jonathan

Dave-Allured commented 3 years ago

@JonathanGregory , thanks for working on the details. Sorry about this late reply. Referring again to PR #315:

... After the deprecation of year zero in reference date/time, I'd like to add the following for the avoidance of doubt. Alternatively it could be inserted after "prohibited for certain calendars, as noted below."

Date/times in zero or negative years are prohibited by calendars which prohibit these years in the reference date/time. In these calendars, it is an error to store or decode a time coordinate value for a date/time earlier than 1-1-1 0:0:0, regardless of the reference date/time in the time units.

This seems to be fully redundant with my current wording. In each of the appropriate calendar descriptions, I already have:

Date/times earlier than 1-1-1 0:0:0 are prohibited.

Then at the bottom, I have:

Some calendars have a restricted time range, as noted, to avoid multiple interpretations. These restrictions apply to both the reference date/time string, and to encoded time values.

Perhaps you missed this bottom part? Should this be repositioned to make it easier to see? Note also that I am making some effort to keep the wording reasonably concise, and to avoid unnecessary repetition.

Dave-Allured commented 3 years ago

I also agree with your deprecation of the year 0 climatological convention, but it is stronger than what section 7.4 currently says, which is, "We do not recommend this convention." That sounds the same, but it's not currently noted in the conformance document for section 7.4. Therefore I think we must either not deprecate it in 4.4, or deprecate it in 7.4, to be consistent.

Agreed. I reverted to "not recommended". Please see new wording in #315. Your further suggestions are welcome.

Dave-Allured commented 3 years ago

Another small suggestion: I think date-time with a hyphen would be better than date/time with a slash. I understand "/" to indicate alternatives, whereas "-" joins words.

Date/time with slash is consistent with long-standing current usage, also with related external usage. This is a minor style issue. Either spelling would be reasonable. Guidance for forming these kinds of compounds is nuanced. Please move this to a new issue if you want to pursue it. Thank you.

JonathanGregory commented 3 years ago

Dear @Dave-Allured

You're right, I didn't notice this statement further down:

Some calendars have a restricted time range, as noted, to avoid multiple interpretations. These restrictions apply to both the reference date/time string, and to encoded time values.

Perhaps you missed this bottom part? Should this be repositioned to make it easier to see? Note also that I am making some effort to keep the wording reasonably concise, and to avoid unnecessary repetition.

Yes, I do think it would be better further up, at the end of the year-zero paragraph, so all the general points are made together. Since it applies to any restriction on dates, not just year zero but also e.g. there is no 31st of any month in the 360-day calendar, I suggest stating it more generally - and concisely:

Dates which are not permitted in the calendar of the time coordinate variable cannot be encoded as time coordinate values and must not be used in the reference date/time string.

There is one occurrence of "date/time" in the present version of the standard and none of "date-time" so I won't dispute it! I will note that https://github.com/cf-convention/cf-conventions/pull/316 should be amended.

I note that "preceding" has a typo.

Please add yourself to the list of authors.

Best wishes and thanks

Jonathan

Dave-Allured commented 3 years ago

@JonathanGregory, I moved that paragraph and somewhat reworded your suggestion. I also updated the conformance document. Please review the latest changes in PR #315.

My changes on this ticket are fine tuning of previous work, nothing more. This is not enough to qualify me as an author, so I did not add to the author list. But thank you for your consideration.

JonathanGregory commented 3 years ago

@Dave-Allured, thank you for those changes. I agree with all that. In the conformance document, I think we also need

The reference date/time in the units must be a legal date/time in the specified calendar.

which could be a separate statement from, or possibly combined with, your statement

Encoded time coordinates must be legal date/time values in the specified calendar.

Jonathan

Dave-Allured commented 3 years ago

The reference date/time in the units must be a legal date/time in the specified calendar.

Hmmm. This line is already present in CF 1.8 conformance. But it is under section 4.4, not 4.4.1. It looks like the previous authors wanted to address the units attribute in 4.4, and the calendar attribute in 4.4.1. However, this does seem a little odd, splitting up related content restrictions like this. What do you suggest?