Closed DanielKohn1208 closed 2 months ago
hey @manveertamber can you take a look at this?
@DanielKohn1208 can you add some documentation to explain how to fetch document content for documents retrieved using a dense index: https://github.com/castorini/pyserini/blob/master/docs/usage-fetch.md. It looks like right now we only explain how to do this for a lucene index
We can call .doc() on a FaissSearcher too, which routes to the lucene index .doc() but the documentation doesn't explain this either.
@DanielKohn1208 looks good! But maybe change the headings in the documentation to Using a Lucene Index and Using a Faiss Index instead of Using a Sparse Representation and Using a Dense Representation. And you should add some example code in the documentation that showcases your changes. Like calling .raw() on some retrieved hits.
@manveertamber After going through the documentation, I think that I approached this issue in completely the wrong way. As it is now, my fix actually doesn't actually solve any problems. Someone could already get a hit's raw content from a FaissIndex as follows
from pyserini.search.faiss import FaissSearcher, AutoQueryEncoder
DENSE_INDEX = 'beir-v1.0.0-nfcorpus.bge-base-en-v1.5'
encoder = AutoQueryEncoder('BAAI/bge-base-en-v1.5', device='cpu', pooling='mean', l2_norm=True)
searcher = FaissSearcher.from_prebuilt_index(DENSE_INDEX, encoder)
hits = searcher.search('How to Help Prevent Abdominal Aortic Aneurysms')
doc = searcher.doc(hits[0].docid)
raw_content = doc.raw()
Also note that my original assumption about the LuceneSearcher.search()
method returning a list of objects with a raw()
method was incorrect. Obtaining the raw content of one of these hits would be similar to above code.
Perhaps I am completely misunderstanding the original issue, but I think all that is actually needed is a documentation update?
Yeah I agree. https://github.com/castorini/pyserini/issues/1548 wanted to get passage text from a DenseSearchResult, but the right way to do this is to just to call searcher.doc(docid), which is possible with a FaissSearcher, we just needed the documentation to reflect this. @lintool is this enough to close https://github.com/castorini/pyserini/issues/1856?
Should I close this PR and create a new one at some point to update the docs?
Should I close this PR and create a new one at some point to update the docs?
Yeah that works.
Closing PR
closes #1856 by adding a
raw
method to bothDenseSearchResult
andPRFDenseSearchResult
Had to change how
DenseSearchResult
andPRFDenseSearchResult
is initialized as well.