avalonmediasystem / avalon

Avalon Media System – Samvera Application
http://www.avalonmediasystem.org/
Apache License 2.0
93 stars 51 forks source link

Rework Indexing for Unknown Date Values #4911

Open joncameron opened 2 years ago

joncameron commented 2 years ago

Description

In Avalon specific dates are indexed in such a way that faceting on a specific date, such as 1968, includes every item which has a date range in EDTF that could include that date. For example, faceting on 1968 includes every item that has 19uu or 196u as the date value in the metadata. This inflates the counts for the years and doesn't provide much value for people that are faceting to find information for a specific time.

Another funky side effect of this is values like "2069" for date, even though there appear to be no records that have that value (only 20uu records which are then extrapolated). See https://media.dlib.indiana.edu/catalog?f%5Bdate_sim%5D%5B%5D=2030 for an example of this behavior.

elynema commented 8 months ago

That behavior comes from the EDTF gem that we use to format dates and is backward support for a feature of the EDTF draft spec. The spec called it "Masked Precision" and so 190x is signifying a decade (1900-1909) and 20xx is defining a century (2000-2099) in the gem's logic. Masked precision was removed from the final spec, but general unspecified years for the final spec should be using "X" as the designator.

I don't remember how exactly the faceting is processing the EDTF date objects but should be something we can handle better. Maybe we should also create another migration to update the date records so that they are in compliance with the EDTF final spec.