gbif / gbif-api

GBIF API
Apache License 2.0
28 stars 5 forks source link

Representing imprecise dates in the Java object model #36

Open MattBlissett opened 5 years ago

MattBlissett commented 5 years ago

There's some similar discussion on #2, for occurrences.

The motivation is to fix https://github.com/gbif/portal-feedback/issues/1676 and similar issues around dates in metadata properly.

The pubDate and temporalCoverage on a Dataset are very often only given as a year, and we should retain that. Changing the JSON response is straightforward enough, I now have:

"pubDate": "2016",
…
"temporalCoverages": [{
    "@type": "range",
    "start": "2013",
    "end": "2015"
}],

instead of the current

"pubDate": "2015-12-31T23:00:00.001+0000",
…
"temporalCoverages": [{
    "@type": "range",
    "start": "2012-12-31T23:00:00.001+0000",
    "end": "2014-12-31T23:00:00.001+0000"
}],

I could fix only the time zone issue of pubDate, and leave the 1 extra millisecond which seems to be an undocumented way to say this is a year-precision date. However, that still leaves the end of the range one year too soon — though I suppose it could be serialized as 2015-12-31T00:00:00.001+0000.

Anyway, for this I used a TemporalAccessor, since it can represent a Year, YearMonth, LocalDate (=YMD) etc. However, there's a strong warning against using this class since it can also represent things like JapaneseDate which break the usual assumptions we have about ISO dates. It's also a bit cumbersome to use — to get the year means checking it holds a year, then requesting it. So, it's much looser than we require.

I think what @mdoering wrote in #2, of creating an IsoDate class, makes most sense. This would only represent a date (not a date range), either as a year, year and month, or year month and day. I think it would be a fairly simple wrapper around Year, YearMonth, and LocalDate, e.g. returning the most precise available, or fetching a year. It can serialize into a single, ISO 8601 format field of 4, 6 or 8 digits.

I'm not too concerned with @cgendreau's concerns. Deserialization of three well-defined formats (YYYY, YYYY-MM and YYYY-MM-DD) is easy and fast. (Faster than deserializing an ISO date!)

What does everyone else think?

  1. One millisecond hack to say it's only the year
  2. TemporalAccessor
  3. New IsoDate class
  4. Something else

The Varnish logs suggest two regular users of the Java API for this endpoint. (Plazi and GBIF Japan.)

MattBlissett commented 5 years ago

The registry is changed (2.97) to give:

"pubDate": "2016-01-01T00:00:00.001+0000",
…
"temporalCoverages": [{
    "@type": "range",
    "start": "2013-01-01T00:00:00.001+0000",
    "end": "2015-12-31T00:00:00.001+0000"
}],

i.e. use the 0.001 hack but fix the other issues with timezones and the end day of a year range.

We will revisit this for API V2. (No plan for that yet.)