castorini / ragnarok

Retrieval-Augmented Generation battle!
Apache License 2.0
38 stars 2 forks source link

Error running instructions in rag24.md Retrieval step #6

Closed jayavanth closed 1 month ago

jayavanth commented 1 month ago

When I run the java code in the retrieval step

export ANSERINI_JAR=anserini-0.36.1-fatjar.jar
export OUTPUT_DIR="runs"
TOPICS=(rag24.raggy-dev rag24.researchy-dev)
for t in "${TOPICS[@]}"; do
    java -cp $ANSERINI_JAR io.anserini.search.SearchCollection \
        -index msmarco-v2.1-doc-segmented \
        -topics $t \
        -output $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.${t}.txt \
        -threads 16 \
        -bm25 \
        -hits 100 \
        -outputRerankerRequests $OUTPUT_DIR/retrieve_results_msmarco-v2.1-doc-segmented.bm25.${t}_top100.jsonl 

I get

Index file already exists! Skip downloading.
Index folder already exists!
2024-07-21 01:53:23,970 INFO  [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============
2024-07-21 01:53:23,974 INFO  [main] search.SearchCollection (SearchCollection.java:1009) - Index: /home/jay/.cache/pyserini/indexes/lucene-inverted.msmarco-v2.1-doc-segmented.20240418.4f9675.6ec4cd595c9fe1ad91b43eabb39a637c
Jul 21, 2024 1:53:23 AM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
2024-07-21 01:53:24,166 INFO  [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 16
2024-07-21 01:53:24,166 INFO  [main] search.SearchCollection (SearchCollection.java:1013) - Fields: []
2024-07-21 01:53:24,166 INFO  [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: false
2024-07-21 01:53:24,167 INFO  [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 100
2024-07-21 01:53:24,167 INFO  [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null
2024-07-21 01:53:24,168 INFO  [main] search.SearchCollection (SearchCollection.java:1332) - Using DefaultEnglishAnalyzer
2024-07-21 01:53:24,169 INFO  [main] search.SearchCollection (SearchCollection.java:1333) - Stemmer: porter
2024-07-21 01:53:24,169 INFO  [main] search.SearchCollection (SearchCollection.java:1334) - Keep stopwords? false
2024-07-21 01:53:24,170 INFO  [main] search.SearchCollection (SearchCollection.java:1335) - Stopwords file: null
2024-07-21 01:53:24,235 INFO  [main] search.SearchCollection (SearchCollection.java:1345) - ============ Launching Search Threads ============
2024-07-21 01:53:24,235 INFO  [main] search.SearchCollection (SearchCollection.java:1346) - runtag: Anserini
2024-07-21 01:53:26,241 INFO  [pool-3-thread-14] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: bm25(k1=0.9,b=0.4), reranker: default: 100 queries processed
2024-07-21 01:53:27,116 INFO  [pool-2-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:925) - ranker: bm25(k1=0.9,b=0.4), reranker: default: 120 queries processed in 00:00:02 = ~42.08 q/s
2024-07-21 01:53:28,158 INFO  [main] search.SearchCollection (SearchCollection.java:1418) - Total run time: 00:00:04
Index file already exists! Skip downloading.
Index folder already exists!
2024-07-21 01:53:28,939 INFO  [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============
2024-07-21 01:53:28,942 INFO  [main] search.SearchCollection (SearchCollection.java:1009) - Index: /home/jay/.cache/pyserini/indexes/lucene-inverted.msmarco-v2.1-doc-segmented.20240418.4f9675.6ec4cd595c9fe1ad91b43eabb39a637c
Jul 21, 2024 1:53:28 AM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
2024-07-21 01:53:29,112 INFO  [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 16
2024-07-21 01:53:29,113 INFO  [main] search.SearchCollection (SearchCollection.java:1013) - Fields: []
2024-07-21 01:53:29,113 INFO  [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: false
2024-07-21 01:53:29,113 INFO  [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 100
2024-07-21 01:53:29,114 INFO  [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null
2024-07-21 01:53:29,114 INFO  [main] search.SearchCollection (SearchCollection.java:1332) - Using DefaultEnglishAnalyzer
2024-07-21 01:53:29,115 INFO  [main] search.SearchCollection (SearchCollection.java:1333) - Stemmer: porter
2024-07-21 01:53:29,115 INFO  [main] search.SearchCollection (SearchCollection.java:1334) - Keep stopwords? false
2024-07-21 01:53:29,115 INFO  [main] search.SearchCollection (SearchCollection.java:1335) - Stopwords file: null
Error: "rag24.researchy-dev" does not refer to valid topics.
2024-07-21 01:53:29,168 INFO  [main] search.SearchCollection (SearchCollection.java:1418) - Total run time: 00:00:00
jayavanth commented 1 month ago

was able to fix it after downloading the topic and then passing a topicReader