PrithivirajDamodaran / FlashRank

Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cross-encoders and more. Created by Prithivi Da, open for PRs & Collaborations.
Apache License 2.0
441 stars 37 forks source link

Reranking with ms-marco-MultiBERT-L-12 returns complex score #18

Closed M1ha-Shvn closed 1 month ago

M1ha-Shvn commented 1 month ago

Hi. flashrank version: 0.2.4 I'm trying to implement a reranker using ms-marco-MultiBERT-L-12 model (language is not English in my case). I do the following:

passages = [
    {"id": "1", "text": "Our library is closed at 3pm.", "meta": {}},
    {"id": "2", "text": "You can buy books cheaply in our library book store.", "meta": {}},
    {"id": "3", "text": "The library address is Washington street, 7.", "meta": {}},
]
query = "How do I get to the library?"
request: RerankRequest = RerankRequest(query=query, passages=passages)
ranker = Ranker(model_name="ms-marco-MultiBERT-L-12", cache_dir="/app/src/flashrank_cache", max_length=1024)
result = ranker.rerank(request)

What I get is very unexpected:

  1. score field is not float, but a list of floats. What does it mean?
  2. The most relevant result is on the third place.
    [
    {'id': '2', 'text': 'You can buy books cheaply in our library book store.', 'meta': {}, 'score': [0.9290498495101929, 0.10876275599002838]}, 
    {'id': '1', 'text': 'Our library is closed at 3pm.', 'meta': {}, 'score': [0.8656406998634338, 0.17762842774391174]}, 
    {'id': '3', 'text': 'The library address is Washington street, 7.', 'meta': {}, 'score': [0.06054011359810829, 0.9271655082702637]}
    ]

    It looks like this strange score should be sorted by its second number, in this case result would be relevant. Can you give me a clue, what I'm doing wrong?

PrithivirajDamodaran commented 1 month ago
  1. ) The score tensors having 2 dims was a issue, it is fixed now you should get passages sorted based on the score.
  2. ) The models will rank fine, if the Passages don't have grammatical errors and typos AND the if the query is sensible. See below
from flashrank import Ranker, RerankRequest
ranker = Ranker("ms-marco-MultiBERT-L-12", log_level="DEBUG")

passages = [
    {"id": "1", "text": "Our library is closed at 3pm.", "meta": {}},
    {"id": "2", "text": "You can buy books cheaply in our library book store.", "meta": {}},
    {"id": "3", "text": "The library address is in Washington street, 7.", "meta": {}},
]
query = "Where is the library?"

request = RerankRequest(query=query, passages=passages)
results = ranker.rerank(request)

prints

{'id': '3', 'text': 'The library address is in Washington street, 7.', 'meta': {'additional': 'info3'}, 'score': 0.9984252}
{'id': '1', 'text': 'Our library is closed at 3pm.', 'meta': {'additional': 'info1'}, 'score': 0.0036173377}
{'id': '2', 'text': 'You can buy books cheaply in our library book store.', 'meta': {'additional': 'info2'}, 'score': 0.0020632916}

Upgrade to 0.2.5