beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

Reranking with custom SentenceTransformer model issue - SentenceTransforme' object has no attribute 'encode_queries'Hi, #6

Closed corticalstack closed 3 years ago

corticalstack commented 3 years ago

Hi,

After query generation with T5Tokenizer and T5ForConditionalGeneration, then using the generated queries to tune models.Transformer('distilbert-base-uncased'), following:

https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/query_generation

I am now attempting to use this model to re-rank results from BM25 as per following example:

https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/reranking/evaluate_bm25_sbert_reranking.py

I instantiate my model with:

SentenceTransformer(self.model_path)

followed by:

from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES model = DRES(my_neural_model, batch_size=128) dense_retriever = EvaluateRetrieval(model, score_function="cos_sim", k_values=[1, 3, 5, 10, 100]) rerank_results = dense_retriever.rerank(corpus, queries, results, top_k=100)

But get:

SentenceTransformer' object has no attribute 'encode_queries' from:

File "/home/jp/anaconda3/envs/pybase/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'SentenceTransformer' object has no attribute 'encode_queries'

I can see that examples of re-rankers, such as the cross encoder, have defined methods enocde_queries and encode_corpus .

Grateful for any pointer on what obvious errors I may be making and how I can use own model with cos-sim to rerank. Many thanks

thakur-nandan commented 3 years ago

Hi @corticalstack,

Rerankers such as the cross-encoder have the predict method defined, they do not encode queries or documents.

In the example you are trying to run, how about providing the path to the tuned distilbert model directly to the SentenceTransformers itself, like mentioned below?

from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
from beir.retrieval import models

model = DRES(models.SentenceBERT(distilbert_model_path), batch_size=128) 
dense_retriever = EvaluateRetrieval(model, score_function="cos_sim", k_values=[1, 3, 5, 10, 100]) 
rerank_results = dense_retriever.rerank(corpus, queries, results, top_k=100)
thakur-nandan commented 3 years ago

In the future, if you wish to run a custom model provide a custom class function with encode_queries and encode_corpus functions. Find a template code below:

from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

class YourCustomModel:
    def __init__(self, model_path=None, **kwargs)
        self.model = None # ---> HERE Load your custom model

    # Write your own encoding query function (Returns: Query embeddings as numpy array)
    def encode_queries(self, queries: List[str], batch_size: int, **kwargs) -> np.ndarray:
        pass

    # Write your own encoding corpus function (Returns: Document embeddings as numpy array)  
    def encode_corpus(self, corpus: List[Dict[str, str]], batch_size: int, **kwargs) -> np.ndarray:
        pass

model = DRES(YourCustomModel(my_neural_model_path), batch_size=128) 
dense_retriever = EvaluateRetrieval(model, score_function="cos_sim", k_values=[1, 3, 5, 10, 100]) 
rerank_results = dense_retriever.rerank(corpus, queries, results, top_k=100)

Hope it helps!

Kind Regards, Nandan

corticalstack commented 3 years ago

Perfect, all good (and still remain blown away by how well BM25 does on many benchmarks......)