castorini / covidex

A multi-stage neural search engine for the COVID-19 Open Research Dataset
https://covidex.ai
MIT License
137 stars 27 forks source link

sorting #47

Open kyunghyuncho opened 4 years ago

kyunghyuncho commented 4 years ago

it is somewhat related to faceting (https://github.com/castorini/covidex/issues/46) but requires much less work and thinking.

one of the feedbacks we have received directly from the feedback form is the need of sorting the result based on publication dates. i agree this is an important feature, especially since our index is updated weekly. what people want is more of differential.

can we add it quickly?

cc @lintool @edwinzhng @nikhilro

kyunghyuncho commented 4 years ago

since we don't want to break re-ranking too much, how about "chronologically sort top-50"?

lintool commented 4 years ago

This is actually a much harder problem then it seems... and dates back more than ~20 years, at the dawn of web search. The problem back then was how to combine a static prior (say, PageRank or HITS) with a relevance score (e.g., BM25). Since both are continuous qualities, you've transformed a simple linear ranking problem into a multi-objective optimization problem - the two dimensions being relevance and page quality.

This is exactly the same situation here, substitute PageRank for time...

rodrigonogueira4 commented 4 years ago

Can we do like google scholar, i.e., "since 2020", "since 2019" ..., "custom range"?

lintool commented 4 years ago

@rodrigonogueira4 that's faceting ;) ... which brings us back to #46