beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

Question about ideal architecture for deep IR #59

Open pablogranolabar opened 2 years ago

pablogranolabar commented 2 years ago

Hi again @NThakur20! I've got an interesting search project which consists of a golden set of search queries and their results, for a financial services domain search application. One of the search types is for financial analysts, e.g. a partial analyst name search query which is answered by the search engine with the full contact particulars for that analyst. Using a fine tune trained T5.1 large parameter model I am achieving 97%+ classification accuracy for observed searches but the issue here is generalization to new searches where the analyst exists only in the database and the model needs to generate the response based on unsupervised contact data that's in the contact analyst database. So the thought was to either train an MS MARCO T5 model in an unsupervised fashion on the contact database in hopes that it generalizes to unobserved search queries, or to populate a deep IR pipeline with those records and use that for the analyst contact retrieval. Is this a reasonable use case with BEIR?