OpenHistoricalMap / issues

File your issues here, regardless of repo until we get all our repos squared away; we don't want to miss anything.
Creative Commons Zero v1.0 Universal
19 stars 1 forks source link

Add support for uncertain date formats (ISO 8601-2:2019) #15

Open jeffreyameyer opened 4 years ago

jeffreyameyer commented 4 years ago

What's your idea for a cool feature that would help you use OHM better. It would be great to have support for uncertain dates and time ranges.

This uncertainty could accommodate for vague or unknown time starts, approximate dates, etc.

More details can be found at: https://www.iso.org/standard/70908.html

This feature will have significant impacts across OHM, to include data storage, rendering rules, and time slider behavior.

Current workarounds None - the site currently requires specific, unqualified dates in the old 8601 format.

danrademacher commented 4 years ago

Over in this comment https://github.com/OpenHistoricalMap/issues/issues/146#issuecomment-686755914 @hroest asked about whether this is the place to handle uncertain dates: https://github.com/OpenHistoricalMap/DateFunctions-plpgsql

I think the general answer is "yes" but there's a lot more to say.

The functions over there convert an ISO date to a "decimal date" - a year plus a fractional decimal that allows us to reliably sort and filter items with the timeslider.

We could in theory have methods to convert other date formats to "decimal date", but right now in all cases the translation would have to result in a single decimal number.

So 1950s would need to be assumed to be 1950. I wonder if we might end up with some issues where 1950 as start_date should be 1950 but as end_date it should be 1959.99999

1955..1960 would need to be either 1955 or 1960 or, who knows, maybe 1957.5.

I think each of these cases is solvable in code, but needs some clarity on what we want to do.

danrademacher commented 4 years ago

handling date ranges

Based on discussion in Discord with @padiwik, we have a feasible way forward here to at least manage these in the near term:

as a first approximation, i think it's ok if the server returns the latest possible start date, i.e. when we are sure the item in question exists

So given an input like DATE..DATE then we would:

  1. split on ..
  2. If start_date take the second item to show it when we have highest confidence the item exists
  3. If end_date take the first item to be conservative about when it's gone.

handling decades

For decades with YYYYs we would:

  1. Drop the s
  2. If start_date assume 1950-01-01 and convert to decimal date
  3. If end_date assume 1959-12-31 and convert to decimal date
padiwik commented 4 years ago

why is your suggested behavior for decades distinct from other ranges?

danrademacher commented 4 years ago

Ah, maybe you'd prefer the more conservative approach of

I am naturally glib and prefer to see more data with less certainty than vice versa. But very open to the inverse!

padiwik commented 4 years ago

I suggested the conservative approach because I don't believe before 1850 should appear at the beginning of time. And then I thought the approach should be consistent, but it could also make sense to treat it differently in the case when both the beginning and the end of a range are known.

jeffreyameyer commented 3 years ago

Where are we on this, given recent discussions about parsers, etc.? @danrademacher @rwelty1889 @geohacker @batpad

I'm getting the sense that this may not be as tricky to implement across the stack (e.g. core db, tile filters, stylesheets, etc.) as I had thought. Am I wrong?

We do have a lot of user requests for supporting this.

A workaround I think might work (even if a workaround is lame) is [foo]_date.edtf =~1976

1ec5 commented 1 year ago

A workaround I think might work (even if a workaround is lame) is [foo]_date.edtf =~1976

Unfortunately, I think it might be even more complex if we push the responsibility for parsing EDTF onto the client. The vector tiles currently encode start_date and end_date as decimal numbers to get around the fact that the Mapbox Style Specification’s expression language lacks some important string operations, such as regular expression matching (mapbox/mapbox-gl-js#4089) and string splitting (maplibre/maplibre-gl-js#2064). If the tiles contain EDTF verbatim, the frontend would need to use a fork of GL JS that provides a hook so that the website can extend it with an EDTF-parsing operator. That operator could be implemented using EDTF.js, but a fork of GL JS might come with some unwanted maintenance overhead, and it would limit compatibility with potential third-party projects.

The alternative of parsing within PostgresQL might be feasible. If we aren’t comfortable rolling our own parser in PL/pgSQL, perhaps ohm-deploy could define a Python function that uses python-edtf to do the parsing. Or a Rust function that uses edtf-rs if a Rust driver is installed, etc.