Cannot reproduce the BAAI/bge-reranker-large re-ranker model results

suraj-gade commented 1 year ago

Hi

I am using “BAAI/bge-reranker-large” model using AutoModelForSequenceClassification class to rerank the relevant documents to the a question in my RAG setup.

Here is the sample code that I am using.

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')
model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')
model.eval()

pairs = [[user_input, doc] for doc in documents]
with torch.no_grad():
    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
    scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
    print(scores)

documents is a list of relevant documents to the user_input which is a user question

Most of the time I am getting expected results, i.e. most relevant document to the user question is ranked at top with highest score But sometimes I am getting different irrelevant document ranked at top for the same question where I was getting correct results earlier.

How can I reproduce the same results each time. Is there any parameter that we can use (like seed) to reproduce the results?

staoxiao commented 1 year ago

Hi, it seems no bug in the code. Are you sure the reranker score is different for the same pairs? Maybe because the candidate document returned by retriever model is different each time?

suraj-gade commented 1 year ago

Hi @staoxiao, Yes, I am getting different score from reranker for the same pairs. I have checked the documents fetched by the retriver model, those are same each time So there is no issue in that part.

The problem is in reproducing the same reranker score each time with same document pairs.

staoxiao commented 1 year ago

The observed phenomenon is peculiar, and as of now, we also have no idea. This anomaly may be associated with Hugging Face or Torch.

FlagOpen / FlagEmbedding

Cannot reproduce the BAAI/bge-reranker-large re-ranker model results #254