castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.66k stars 370 forks source link

Issue with fetching raw documents #1923

Closed waylight3 closed 3 months ago

waylight3 commented 3 months ago
from pyserini.search.lucene import LuceneSearcher

searcher = LuceneSearcher.from_prebuilt_index('beir-v1.0.0-nfcorpus.splade-pp-ed')
doc = searcher.doc('MED-10')
print(doc.contents()) # None
print(doc.raw()) # None

When attempting to fetch a raw document using the beir-v1.0.0-nfcorpus.splade-pp-ed index, the function returns None. However, when I use the beir-v1.0.0-nfcorpus.flat index, it returns the correct document as expected. Could you please clarify if this issue is related to the prebuilt index itself or if it might be due to a bug in the library? Any guidance or suggestions to resolve this matter would be greatly appreciated.