castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.64k stars 364 forks source link

Having problem loading splade-pp-ed BEIR prebuilt indexes #1818

Closed zwc662 closed 6 months ago

zwc662 commented 6 months ago

Hello,

I am trying to reproduce the expeirmental results of rank_llm.

But I cannot load splade-pp-ed BEIR prebuilt indexes with LuceneImpactSearcher by using pyserini 0.24.0.

I notice that the latest pyserini package 0.24.0 was released in Dec 2023, while the splade-pp-ed BEIR prebuilt indexes were not installed until 1764 in Jan 2024.

I have tried building pyserini from source by git cloning this repo and then python setup.py install.

Then I got error when importingLuceneImpactSearcher. The error says that it cannot find any jar file under pyserini/pyserini/resources/jars.

I tried to resolve it by copying anserini-0.24.0-fatjar.jar from the 0.24.0 version to the folder pyserini/pyserini/resources/jars. But then I got this error when loading the prebuildt beir-v1.0.0-trec-covid.test.splade-pp-ed

'beir-v1.0.0-trec-covid.test.splade-pp-ed': JTopics.BEIR_V1_0_0_TREC_COVID_TEST_SPLADE_PP_ED,
AttributeError: type object 'io.anserini.search.topicreader.Topics' has no attribute 'BEIR_V1_0_0_TREC_COVID_TEST_SPLADE_PP_ED'. Did you mean: 'BEIR_V1_0_0_TREC_COVID_TEST_UNCOIL_NOEXP'?

I guess the jar file should also be updated too.

So how should I load splade-pp-ed BEIR prebuilt indexes with LuceneImpactSearcher?

zwc662 commented 6 months ago

NVM, I just copied the anserini fatjar to pyserini/pyserini/resources/jar and it works.