CDRH / api

Codenamed "Apium": An API to access all public Center for Digital Research in the Humanities resources
https://cdrhdev1.unl.edu/api_frontend
MIT License
3 stars 1 forks source link

Decade faceting #113

Open jduss4 opened 3 years ago

jduss4 commented 3 years ago

For our date.year approach, the API is currently using a date histogram aggregation with intervals. This works great (NOTE: at least until v7 when we need to change how we're sending the info), but can't accommodate decades afaik.

There is a date range aggregation, which I assume we could use if we just parsed the incoming year and populated the range ourselves. That is, if we sent 1887 for date.decade then we would populate 1880-1889 or something like that. So that's an option, though not a particularly elegant one.

There is also an auto-interval date histogram aggregation which would definitely be interesting to use, and can accommodate 10 year spans, but I don't think it would lock to our expectation of decades. This would be more like the Nebraska Newspapers searching functionality where we tell it "break the results into clumps of years" and it returns with "1881-1885 (9), 1886-1891 (13) ..."

The other option is to prepopulate decade into a field in the API itself during the ingest. Presumably we could figure out how to do this with the elasticsearch schema / mapping itself, if we wanted to be fancy, or just use the data repo scripts if we want to be less fancy. The big downside to this is that we would only be able to do this for one date per document + nested fields. That is, if we have an author born in 1900 who dies in 1957, do we set the "decade" as 1900 or 1950? It would be more convenient if we could use the API functionality to instead ask for death_date.decade and get 1950, etc.

Related to #31

wkdewey commented 1 year ago

Is this necessary for Habeas?

karindalziel commented 1 year ago

Let's leave this for a later release