citation-style-language / documentation

Citation Style Language documentation
http://citationstyles.org/
Creative Commons Attribution Share Alike 4.0 International
44 stars 21 forks source link

More complete EDTF data model support #145

Open cormacrelf opened 3 years ago

cormacrelf commented 3 years ago

Upcoming CSL-JSON changes include support for EDTF as a date input format. I recently implemented EDTF, and I have some thoughts about how we can make use of its features in CSL.

What EDTF has that we don't

EDTF is a great format for CSL, because we have supported date ranges since forever, and some of the unofficial date formats we use resemble EDTF already. However it adds three new things we did not have before.

  1. Unspecified parts of dates, using the X character to blot them out.
  2. A flag for "approximate" in addition to "uncertain"
  3. A datetime representation, e.g. 2019-07-16T01:57:29Z.
  4. A defined calendar.

Unspecified date parts / 1999-XX and friends

You might think that we could just add terms for month-unspecified and day-unspecified and call it a day. But I think we'd be missing out -- the spec doesn't advertise it very well, but the feature is more expressive than that.

There are a few different variations on the XX in EDTF level 1. In my opinion the spec should have named them like so: 19XX => century, 199X => decade, 1999-XX => month of year, 1999-XX-XX => day of year, 1999-07-XX => day of month. Styles/locales could render 19XX as "20th century" or "1900s" if they so wished! However, given this is academic citation, I'm not sure how useful that would be. If anyone can point to a style that might want special rendering for any of these forms, then it's something we can definitely do.

Approximate

We currently have is-uncertain-date, the circa term, and "circa": true in CSL-JSON. For reference, EDTF encodes these its uncertainties as ? => uncertain, ~ => approximate, % => both.

On a basic level, you could add terms for approximate and approximate-uncertain, and also add is-approximate-date="issued" as a conditional test.

One complication is that EDTF makes approx/uncertain a property of each end of a date range, i.e. you can have 1999?/2003 meaning (uncertain 1999) to 2003. Our current model is insufficient for that, it can only work with a date as a whole. You could therefore add a certainty date part as well, which simply renders one of the three terms or nothing, in either the single date or on each end of the range. This would be an improvement over the existing syntax even ignoring the approximate addition.

Date time representation

My favourite citation style, AGLC4, now supports citing tweets/forum posts/videos, and requires a timestamp as well as a date. It renders them like so:

Social media posts, forum posts and online videos uploaded to sites such as YouTube may be cited as follows:

Username, Title (Social Media Platform, Full Date, Time) <URL>.

... The time zone from which the post is accessed (eg ‘AEDT’) should be included if the social media platform adjusts the time based on the local time zone.

@s_m_stephenson (Scott Stephenson) (Twitter, 17 July 2017, 9:37pm AEST) <https://twitter.com/s_m_stephenson/status/8871694255514419 21>, ....

I don't think this will be the only one out there. We don't currently support times at all, and I think we should.

A couple of notes about this:

A defined calendar

AFAIK CSL has never operated within a specific calendar, it just renders what you put in. EDTF uses the ISO 8601 calendar, see my notes here on what that means: https://docs.rs/edtf/0.2.0/edtf/#notes-on-edtf-and-the-iso-8601-calendar-system. (Obviously you would render these in gregorian style generally, ie 0000 renders as 1BC, -0099 as 100BC.) For modern dates, that's the same as we would normally write them, but in some places dates weren't written in the modern Gregorian calendar until the early 1900s (e.g. Russia, 1918). The UK only switched in 1752. That's really not that long ago, especially since some case law/legislation from before then is still cited fairly frequently.

Idea 1: Accuracy of old dates

I don't think you'll find any citation styles which dictate what calendar to write dates in, but that isn't to say that the problem doesn't exist; in fact it is probably part of the problem for historians, since nobody is forcing anyone else to write what kind of date something is. We could tip the scales with a very simple feature: a configuration in a style or a locale (?) which sets the start of the modern era for dates. Any date before this could be rendered with a term for new style dates (e.g. (n.s.)), thus forcing people to check that it actually is a new style date.

A much more complex feature would be the configurable rendering of dates in other calendars. I'm pretty sure @fbennett had a feature for rendering the oddities of Japanese calendars, but I'm not sure we should require every CSL implementation to do complex calendar maths. It could be an optional thing. If we wanted such a feature, we could make the the Unicode CLDR calendars optional. (Although, CLDR does not include Julian! How did they manage to omit it???)

Idea 2: Days of the week

Again, I don't know if any styles demand this, but until now it has not been technically possible to know which day of the week something is, because CSL didn't define a calendar. If you make CSL calendar aware, you get days of the week for free.

In summary

EDTF opens up a couple of new opportunities that are worth considering. The most obviously valuable one appears to be datetimes, but there are a lot of possibilities.

bdarcus commented 3 years ago

EDTF opens up a couple of new opportunities that are worth considering. The most obviously valuable one appears to be datetimes, but there are a lot of possibilities.

Yes, and basic 8601 dates are still valid.

Little thing: I've never understood the uncertain/approximate distinction, at least as it applies here. Do you?

cormacrelf commented 3 years ago

I think it boils down to the words themselves:

cormacrelf commented 3 years ago

If anything "circa" should be for approximation, not uncertainty.

bdarcus commented 3 years ago

If anything "circa" should be for approximation, not uncertainty.

So then what should a CSL processor do with an uncertain date?

I had wondered if it should treat both as circa, but I guess we can treat them separately in the spec as well, so that a style could output "1521?" or "c. 1521", or even "c. 1521?"?

denismaier commented 3 years ago

Some styles might treat it as meaning the same, but in general I think ca. vs ? sounds reasonable.

bdarcus commented 3 years ago

Some styles might treat it as meaning the same, but in general I think ca. vs ? sounds reasonable.

Right; so we definitely need to support both explicitly for input (as in edtf) and styles, and of course feature edtf in general prominently in the documentation, once we figure out our plan.

cormacrelf commented 3 years ago

The issue with circa is if you make it synonymous with "approximate", you are left to deal with is-uncertain-date having to be backwards.

bdarcus commented 3 years ago

So just to make sure I understand, @cormacrelf:

The issue with circa is if you make it synonymous with "approximate", you are left to deal with is-uncertain-date having to be backwards.

You are saying:

  1. edtf approximate = csl circa
  2. but that conflicts with the current csl is-uncertain-date
  3. therefore, the implication is we should change the meaning of is-uncertain-date and add is-approximate-date to csl, and update all existing styles to use the latter instead?

Obviously that could be a little painful, but not that big a problem (to convert the styles is just a simple replacement).

bwiernik commented 3 years ago

I had thought we had already implemented approximate and uncertain?

bdarcus commented 3 years ago

Would be good to clarify. @cormacrelf?