UW-COSMOS / Cosmos

Knowledge base construction from raw scientific documents
38 stars 16 forks source link

sort/filter options for anserini, ES search: pub date, journal, publisher #100

Open cambro opened 4 years ago

cambro commented 4 years ago

It would be very useful to be able to sort the response, currently returned in order of a combination of "confidence" and query matching, by other metadata. Big one would be publication date. It will be common for scientists to want to see the latest results first. Secondary filtering would be by journal. Publisher filtering a convenience for communication with publishers (mostly, though not exclusively, could be useful for science too).

cambro commented 4 years ago

Bumping this one. It would be incredibly useful in COVID19. New pubs are needed first!

cambro commented 4 years ago

Bumping this one again!

ankur-gos commented 4 years ago

I'll implement these into the ElasticSearch schema and expose them via the API.

ankur-gos commented 4 years ago

The main change needed for this is that the calls to the xdd API need to happen at ingestion time, so we can store them in the parquet objects.

I'll probably add metadata information to the PDFs parquet, which can then be loaded into Elastic accordingly.