Closed vrdn-23 closed 3 years ago
Does this help?
https://github.com/castorini/pyserini/blob/master/docs/usage-interactive-search.md#how-do-i-perform-dense-and-hybrid-retrieval
(replace the TctColBertQueryEncoder
with AnceQueryEncoder
)
Thanks for the quick response!
Actually, let me be more clear. Don't we need a FAISS index already for the custom data (TREC CAST) I am working with, in order for the DenseSearcher to do retrieval? The index I have is not already loaded into the DINDEX info, and hence I would need to load it locally.
So I guess my question is, is there a substitute method/approach you would recommend other than 'from_prebuilt_index' to locally load a custom index I have?
searcher = SimpleDenseSearcher.from_prebuilt_index( 'custom_data_trec_cast', encoder )
The from_prebuilt_index
supports loading from local too.
i.e. searcher = SimpleDenseSearcher.from_prebuilt_index( <path to local index>
, encoder )
We don't have API for creating Faiss index within the scope of Pyserini package, but there are scripts to create the index: e.g. https://github.com/castorini/pyserini/blob/master/scripts/ance/encode_corpus_msmarco_passage.py
Thanks for that script btw! I understand now how to make it load my own index. I'm using a script of my own to create my Faiss Index so this helps clear up a couple of my own doubts.
I do have one question though: On line 28 in the script, shouldn't we also be passing in the attention_mask into the model, since we would be having pad tokens in the input batch too? I'm not sure if hugging face takes care of that internally, but just thought I should ask?
Aah. Nvm. I see that AnceEncoder is taking care of that in the forward! :)
Thanks again! I'll close the issue seeing as I got what I was looking for!
Hey everyone,
I was wondering if pyserini currently offers any functionality to operate on a custom FAISS index for dense search and hybrid search?
I am currently in the process of creating a FAISS index using the ANCE encoder for the TREC CAsT data (which I'd also be looking forward to add here once I have it up and running and confirm it works) and was wondering if there was a way for me to use this in tandem with the simple searcher offered by pyserini.
Thanks for the great work and looking forward to a reply!