Closed jayavanth closed 1 month ago
When I run the java code in the retrieval step
export ANSERINI_JAR=anserini-0.36.1-fatjar.jar export OUTPUT_DIR="runs" TOPICS=(rag24.raggy-dev rag24.researchy-dev) for t in "${TOPICS[@]}"; do java -cp $ANSERINI_JAR io.anserini.search.SearchCollection \ -index msmarco-v2.1-doc-segmented \ -topics $t \ -output $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.${t}.txt \ -threads 16 \ -bm25 \ -hits 100 \ -outputRerankerRequests $OUTPUT_DIR/retrieve_results_msmarco-v2.1-doc-segmented.bm25.${t}_top100.jsonl
I get
Index file already exists! Skip downloading. Index folder already exists! 2024-07-21 01:53:23,970 INFO [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============ 2024-07-21 01:53:23,974 INFO [main] search.SearchCollection (SearchCollection.java:1009) - Index: /home/jay/.cache/pyserini/indexes/lucene-inverted.msmarco-v2.1-doc-segmented.20240418.4f9675.6ec4cd595c9fe1ad91b43eabb39a637c Jul 21, 2024 1:53:23 AM org.apache.lucene.store.MemorySegmentIndexInputProvider <init> INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false 2024-07-21 01:53:24,166 INFO [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 16 2024-07-21 01:53:24,166 INFO [main] search.SearchCollection (SearchCollection.java:1013) - Fields: [] 2024-07-21 01:53:24,166 INFO [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: false 2024-07-21 01:53:24,167 INFO [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 100 2024-07-21 01:53:24,167 INFO [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null 2024-07-21 01:53:24,168 INFO [main] search.SearchCollection (SearchCollection.java:1332) - Using DefaultEnglishAnalyzer 2024-07-21 01:53:24,169 INFO [main] search.SearchCollection (SearchCollection.java:1333) - Stemmer: porter 2024-07-21 01:53:24,169 INFO [main] search.SearchCollection (SearchCollection.java:1334) - Keep stopwords? false 2024-07-21 01:53:24,170 INFO [main] search.SearchCollection (SearchCollection.java:1335) - Stopwords file: null 2024-07-21 01:53:24,235 INFO [main] search.SearchCollection (SearchCollection.java:1345) - ============ Launching Search Threads ============ 2024-07-21 01:53:24,235 INFO [main] search.SearchCollection (SearchCollection.java:1346) - runtag: Anserini 2024-07-21 01:53:26,241 INFO [pool-3-thread-14] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: bm25(k1=0.9,b=0.4), reranker: default: 100 queries processed 2024-07-21 01:53:27,116 INFO [pool-2-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:925) - ranker: bm25(k1=0.9,b=0.4), reranker: default: 120 queries processed in 00:00:02 = ~42.08 q/s 2024-07-21 01:53:28,158 INFO [main] search.SearchCollection (SearchCollection.java:1418) - Total run time: 00:00:04 Index file already exists! Skip downloading. Index folder already exists! 2024-07-21 01:53:28,939 INFO [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============ 2024-07-21 01:53:28,942 INFO [main] search.SearchCollection (SearchCollection.java:1009) - Index: /home/jay/.cache/pyserini/indexes/lucene-inverted.msmarco-v2.1-doc-segmented.20240418.4f9675.6ec4cd595c9fe1ad91b43eabb39a637c Jul 21, 2024 1:53:28 AM org.apache.lucene.store.MemorySegmentIndexInputProvider <init> INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false 2024-07-21 01:53:29,112 INFO [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 16 2024-07-21 01:53:29,113 INFO [main] search.SearchCollection (SearchCollection.java:1013) - Fields: [] 2024-07-21 01:53:29,113 INFO [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: false 2024-07-21 01:53:29,113 INFO [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 100 2024-07-21 01:53:29,114 INFO [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null 2024-07-21 01:53:29,114 INFO [main] search.SearchCollection (SearchCollection.java:1332) - Using DefaultEnglishAnalyzer 2024-07-21 01:53:29,115 INFO [main] search.SearchCollection (SearchCollection.java:1333) - Stemmer: porter 2024-07-21 01:53:29,115 INFO [main] search.SearchCollection (SearchCollection.java:1334) - Keep stopwords? false 2024-07-21 01:53:29,115 INFO [main] search.SearchCollection (SearchCollection.java:1335) - Stopwords file: null Error: "rag24.researchy-dev" does not refer to valid topics. 2024-07-21 01:53:29,168 INFO [main] search.SearchCollection (SearchCollection.java:1418) - Total run time: 00:00:00
was able to fix it after downloading the topic and then passing a topicReader
When I run the java code in the retrieval step
I get