Closed corticalstack closed 3 years ago
Hi @corticalstack,
Rerankers such as the cross-encoder have the predict
method defined, they do not encode queries or documents.
In the example you are trying to run, how about providing the path to the tuned distilbert model directly to the SentenceTransformers itself, like mentioned below?
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
from beir.retrieval import models
model = DRES(models.SentenceBERT(distilbert_model_path), batch_size=128)
dense_retriever = EvaluateRetrieval(model, score_function="cos_sim", k_values=[1, 3, 5, 10, 100])
rerank_results = dense_retriever.rerank(corpus, queries, results, top_k=100)
In the future, if you wish to run a custom model provide a custom class function with encode_queries
and encode_corpus
functions. Find a template code below:
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
class YourCustomModel:
def __init__(self, model_path=None, **kwargs)
self.model = None # ---> HERE Load your custom model
# Write your own encoding query function (Returns: Query embeddings as numpy array)
def encode_queries(self, queries: List[str], batch_size: int, **kwargs) -> np.ndarray:
pass
# Write your own encoding corpus function (Returns: Document embeddings as numpy array)
def encode_corpus(self, corpus: List[Dict[str, str]], batch_size: int, **kwargs) -> np.ndarray:
pass
model = DRES(YourCustomModel(my_neural_model_path), batch_size=128)
dense_retriever = EvaluateRetrieval(model, score_function="cos_sim", k_values=[1, 3, 5, 10, 100])
rerank_results = dense_retriever.rerank(corpus, queries, results, top_k=100)
Hope it helps!
Kind Regards, Nandan
Perfect, all good (and still remain blown away by how well BM25 does on many benchmarks......)
Hi,
After query generation with
T5Tokenizer
andT5ForConditionalGeneration
, then using the generated queries to tunemodels.Transformer('distilbert-base-uncased')
, following:https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/query_generation
I am now attempting to use this model to re-rank results from BM25 as per following example:
https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/reranking/evaluate_bm25_sbert_reranking.py
I instantiate my model with:
SentenceTransformer(self.model_path)
followed by:
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES model = DRES(my_neural_model, batch_size=128) dense_retriever = EvaluateRetrieval(model, score_function="cos_sim", k_values=[1, 3, 5, 10, 100]) rerank_results = dense_retriever.rerank(corpus, queries, results, top_k=100)
But get:
SentenceTransformer' object has no attribute 'encode_queries'
from:I can see that examples of re-rankers, such as the cross encoder, have defined methods
enocde_queries
andencode_corpus
.Grateful for any pointer on what obvious errors I may be making and how I can use own model with cos-sim to rerank. Many thanks