denser-org / denser-retriever

An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.
https://retriever.denser.ai
MIT License
206 stars 24 forks source link

Questions about the reranker model? Do i need to modify the xgboost reranker with different dataset? #10

Open forestemperor opened 3 months ago

forestemperor commented 3 months ago

as my title goes.

zhiheng-huang commented 3 months ago

1) In https://retriever.denser.ai/docs/experiments/mteb_retrieval, we stated that "For each dataset in MTEB, we trained an xgboost models on the training dataset and tested on the test dataset.". So yes, you need use different re-ranker models on different datasets to replicate the 15 datasets results reported in the url. 2) As MSMARCO dataset is a large dataset, you can try it first to see if it fits your use cases/data. 3) We may introduce a global model which will be trained on all datasets combined later.