Refactor IBM Model 1 reranker (+ LTR)

lintool commented 2 years ago

The IBM Model 1 reranker works (and is tested, yay!) - but the organization could be improved.

For example, there's no counterpart of a Searcher just "does the ranking":

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
hits = searcher.search('what is a lobster roll?')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

So, we have have two parts - a well-abstracted searcher, and a "main" program for end-to-end runs.

TranslationProbabilitySearcher is probably the most descriptive name? IBM Model 1 is how to get the translation probabilities. Query likelihood and ColBERT's MaxSim is how we actually do the scoring.

Ultimately, we want to be able to call something like python -m pyserini.search.lucene.tprob ... (tprob is just my proposal for now).

And the ltr stuff can maybe become python -m pyserini.search.lucene.ltr ...

This proposal will fit into the same structure as this: https://github.com/castorini/pyserini/issues/659#issuecomment-934308746

stephaniewhoo commented 2 years ago

We used to put ltr as a module under search. But running ltr requires local Pyserini instead of pypi, so we put 'main' program for end-to-end runs back as a script, i.e. calling python scripts/ltr.py .... Should we now change it to python -m pyserini.search.lucene.ltr ... Though I think ibm reranker is fine? @yuki617

lintool commented 2 years ago

But running ltr requires local Pyserini instead of pypi, ...

What is there about the impl that requires a local installation? If it's the model, we can put under PYSERINI_CACHE, just like the indexes?

lintool commented 2 years ago

Ref #967

@stephaniewhoo anything more you want to do, or are we done here? if done, please go ahead and close issue.

stephaniewhoo commented 2 years ago

I am good for now.

castorini / pyserini

Refactor IBM Model 1 reranker (+ LTR) #949