castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.69k stars 377 forks source link

ONNX Model not found #2028

Open trevoradriaanse opened 4 days ago

trevoradriaanse commented 4 days ago

Hello!

I'm running the installation instructions for HNSW

python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 --dense --hnsw \
  --index msmarco-v1-passage.bge-base-en-v1.5.hnsw \
  --topics dl19-passage \
  --onnx-encoder BgeBaseEn15 \
  --output run.msmarco-v1-passage.bge-base-en-v1.5.lucene-hnsw.onnx.dl19.txt \
  --hits 1000 --ef-search 1000

and I'm getting the following traceback:

jnius.JavaException: JVM exception occurred: Unable to load Encoder "BgeBaseEn15". java.lang.IllegalArgumentException

I located the encoder model in my Hugging Face cache and noticed there was a safetensors formatted model, but I don't see the ONNX model format. Could this be the issue?

lintool commented 4 days ago

Hi @trevoradriaanse thanks for reporting this. Just to confirm, you're trying to do the PyPI install, correct? Can you confirm that you're pulling the latest version, 0.43.0?

trevoradriaanse commented 4 days ago

Hi @lintool that's correct, I verified that pyserini==0.43.0.

lintool commented 2 days ago

Hrm... I can't seem to reproduce this error. Can you try wget on these files?

wget https://rgw.cs.uwaterloo.ca/pyserini/data/bge-base-en-v1.5-optimized.onnx
wget https://rgw.cs.uwaterloo.ca/pyserini/data/bge-base-en-v1.5-vocab.txt

The server hosting the models has been flaky of late...

Perhaps you can try again?

lintool commented 2 days ago

The Exception is thrown from this location: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/HnswDenseSearcher.java#L123

But BgeBaseEn15 encoder is defined here: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/encoder/dense/BgeBaseEn15Encoder.java

And seems fine...?

trevoradriaanse commented 2 days ago

@lintool Thanks for this info. After running wget, I have bge-base-en-v1.5-optimized.onnx and bge-base-en-v1.5-vocab.txt, but I'm still getting the same exception. I placed these in ~/.cache/pyserini/encoders.

lintool commented 2 days ago

Are you sure there's not another anserini fatjar that's conflicting? E.g., you did a pip install -e that's pointing to an old jar?

trevoradriaanse commented 2 days ago

In pyserini/lib/python3.10/site-packages/pyserini/resources/jars, I have just one fatjar, anserini-0.38.0-fatjar.jar.

lintool commented 2 days ago

I just tried again from scratch and cannot repro you error... :(

What Java version are you on? Mine is:

% java --version
openjdk 21.0.5 2024-10-15 LTS
OpenJDK Runtime Environment Zulu21.38+21-CA (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Zulu21.38+21-CA (build 21.0.5+11-LTS, mixed mode, sharing)
trevoradriaanse commented 2 days ago

I appreciate the help. I get

openjdk 21.0.5-internal 2024-10-15
OpenJDK Runtime Environment (build 21.0.5-internal-adhoc.conda.src)
OpenJDK 64-Bit Server VM (build 21.0.5-internal-adhoc.conda.src, mixed mode, sharing)
lintool commented 2 days ago

Hrm. Are you having trouble running all ONNX models? Go here: https://castorini.github.io/pyserini/2cr/msmarco-v1-passage.html

Pick another ONNX model and see if it's a BGE-specific issue...

trevoradriaanse commented 2 days ago

Yeah, it seems to be a more general problem. I ran

python -m pyserini.search.lucene \
  --threads 16 --batch-size 512 --dense --hnsw \
  --index msmarco-v1-passage.cosdpr-distil.hnsw \
  --topics dl19-passage \
  --onnx-encoder CosDprDistil \
  --output run.msmarco-v1-passage.cosdpr-distil.lucene-hnsw.onnx.dl19.txt \
  --hits 1000 --ef-search 1000

which encounters the same issue:

jnius.JavaException: JVM exception occurred: Unable to load Encoder "CosDprDistil". java.lang.IllegalArgumentException
lintool commented 2 days ago

Interesting. Can you try with Anserini?

https://github.com/castorini/anserini?tab=readme-ov-file#-try-it

Run the two commands and let me know what you see?

trevoradriaanse commented 1 day ago

I'm seeing this traceback fragment for Launching Search Threads after running the two commands

======== Launching Search Threads ============
2024-11-24 10:57:43,281 INFO  [main] search.SearchCollection (SearchCollection.java:1346) - runtag: Anserini
Exception in thread "main" java.lang.RuntimeException
        at io.anserini.search.SearchCollection$SearcherThread.<init>(SearchCollection.java:835)
        at io.anserini.search.SearchCollection.run(SearchCollection.java:1364)
        at io.anserini.search.SearchCollection.main(SearchCollection.java:1412)
lintool commented 1 day ago

I'm perplexed... when I run those two commands, everything works fine. See below for my complete execution trace.

Can you pinpoint where you're diverging?

(pyserini-dev6) jimmylin@iMac-Pro tmp % wget https://repo1.maven.org/maven2/io/anserini/anserini/0.38.0/anserini-0.38.0-fatjar.jar
--2024-11-24 13:38:06--  https://repo1.maven.org/maven2/io/anserini/anserini/0.38.0/anserini-0.38.0-fatjar.jar
Resolving repo1.maven.org (repo1.maven.org)... 2a04:4e42:1e::209, 151.101.124.209
Connecting to repo1.maven.org (repo1.maven.org)|2a04:4e42:1e::209|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 187526278 (179M) [application/java-archive]
Saving to: ‘anserini-0.38.0-fatjar.jar’

anserini-0.38.0-fatjar.jar                                                           100%[====================================================================================================================================================================================================================>] 178.84M  51.5MB/s    in 3.6s    

2024-11-24 13:38:10 (49.6 MB/s) - ‘anserini-0.38.0-fatjar.jar’ saved [187526278/187526278]

(pyserini-dev6) jimmylin@iMac-Pro tmp % java -cp anserini-0.38.0-fatjar.jar io.anserini.search.SearchCollection \
  -index msmarco-v1-passage.splade-pp-ed \
  -topics msmarco-v1-passage.dev \
  -encoder SpladePlusPlusEnsembleDistil \
  -output run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt \
  -impact -pretokenized
msmarco-v1-passage.splade-pp-ed 100% │███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████│ 2102230/2102230 (0:01:27 / 0:00:00) Downloading...
Decompressing index...
Index decompressed successfully!
2024-11-24 13:39:55,715 INFO  [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============
2024-11-24 13:39:55,717 INFO  [main] search.SearchCollection (SearchCollection.java:1009) - Index: /Users/jimmylin/.cache/pyserini/indexes/lucene-inverted.msmarco-v1-passage.splade-pp-ed.20230524.a59610.2c008fc36131e27966a72292932358e6
Nov 24, 2024 1:39:55 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
2024-11-24 13:39:55,831 INFO  [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 4
2024-11-24 13:39:55,831 INFO  [main] search.SearchCollection (SearchCollection.java:1013) - Fields: []
2024-11-24 13:39:55,831 INFO  [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: false
2024-11-24 13:39:55,831 INFO  [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 1000
2024-11-24 13:39:55,831 INFO  [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null
2024-11-24 13:39:55,881 INFO  [main] search.SearchCollection (SearchCollection.java:1345) - ============ Launching Search Threads ============
2024-11-24 13:39:55,881 INFO  [main] search.SearchCollection (SearchCollection.java:1346) - runtag: Anserini
2024-11-24 13:40:06,516 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 100 queries processed
2024-11-24 13:40:13,063 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 200 queries processed
2024-11-24 13:40:20,001 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 300 queries processed
2024-11-24 13:40:27,570 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 400 queries processed
2024-11-24 13:40:38,523 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 500 queries processed
2024-11-24 13:40:50,111 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 600 queries processed
2024-11-24 13:40:57,777 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 700 queries processed
2024-11-24 13:41:05,536 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 800 queries processed
2024-11-24 13:41:13,867 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 900 queries processed
2024-11-24 13:41:21,532 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1000 queries processed
2024-11-24 13:41:27,793 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1100 queries processed
2024-11-24 13:41:34,592 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1200 queries processed
2024-11-24 13:41:41,244 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1300 queries processed
2024-11-24 13:41:50,622 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1400 queries processed
2024-11-24 13:41:58,370 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1500 queries processed
2024-11-24 13:42:05,386 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1600 queries processed
2024-11-24 13:42:14,830 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1700 queries processed
2024-11-24 13:42:23,571 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1800 queries processed
2024-11-24 13:42:32,183 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 1900 queries processed
2024-11-24 13:42:40,854 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2000 queries processed
2024-11-24 13:42:48,597 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2100 queries processed
2024-11-24 13:42:58,171 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2200 queries processed
2024-11-24 13:43:06,772 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2300 queries processed
2024-11-24 13:43:12,299 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2400 queries processed
2024-11-24 13:43:20,951 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2500 queries processed
2024-11-24 13:43:32,919 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2600 queries processed
2024-11-24 13:43:43,206 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2700 queries processed
2024-11-24 13:43:55,918 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2800 queries processed
2024-11-24 13:44:08,059 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 2900 queries processed
2024-11-24 13:44:20,297 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3000 queries processed
2024-11-24 13:44:32,301 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3100 queries processed
2024-11-24 13:44:44,405 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3200 queries processed
2024-11-24 13:44:56,016 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3300 queries processed
2024-11-24 13:45:06,230 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3400 queries processed
2024-11-24 13:45:16,711 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3500 queries processed
2024-11-24 13:45:26,642 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3600 queries processed
2024-11-24 13:45:36,334 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3700 queries processed
2024-11-24 13:45:46,849 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3800 queries processed
2024-11-24 13:45:57,352 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 3900 queries processed
2024-11-24 13:46:08,042 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4000 queries processed
2024-11-24 13:46:18,701 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4100 queries processed
2024-11-24 13:46:28,533 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4200 queries processed
2024-11-24 13:46:38,837 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4300 queries processed
2024-11-24 13:46:50,394 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4400 queries processed
2024-11-24 13:47:03,052 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4500 queries processed
2024-11-24 13:47:17,144 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4600 queries processed
2024-11-24 13:47:32,728 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4700 queries processed
2024-11-24 13:47:53,769 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4800 queries processed
2024-11-24 13:48:17,889 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 4900 queries processed
2024-11-24 13:48:45,684 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5000 queries processed
2024-11-24 13:49:14,770 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5100 queries processed
2024-11-24 13:49:42,819 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5200 queries processed
2024-11-24 13:50:15,414 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5300 queries processed
2024-11-24 13:50:48,380 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5400 queries processed
2024-11-24 13:51:24,235 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5500 queries processed
2024-11-24 13:51:53,145 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5600 queries processed
2024-11-24 13:52:22,449 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5700 queries processed
2024-11-24 13:52:40,688 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5800 queries processed
2024-11-24 13:53:00,007 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 5900 queries processed
2024-11-24 13:53:13,700 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6000 queries processed
2024-11-24 13:53:29,710 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6100 queries processed
2024-11-24 13:53:45,315 INFO  [pool-4-thread-4] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6200 queries processed
2024-11-24 13:53:57,844 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6300 queries processed
2024-11-24 13:54:12,430 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6400 queries processed
2024-11-24 13:54:24,889 INFO  [pool-4-thread-3] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6500 queries processed
2024-11-24 13:54:36,129 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6600 queries processed
2024-11-24 13:54:46,593 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6700 queries processed
2024-11-24 13:54:58,839 INFO  [pool-4-thread-2] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6800 queries processed
2024-11-24 13:55:11,752 INFO  [pool-4-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: impact(), reranker: default: 6900 queries processed
2024-11-24 13:55:22,685 INFO  [pool-3-thread-1] search.SearchCollection$SearcherThread (SearchCollection.java:925) - ranker: impact(), reranker: default: 6980 queries processed in 00:15:24 = ~7.55 q/s
2024-11-24 13:55:30,405 INFO  [main] search.SearchCollection (SearchCollection.java:1418) - Total run time: 00:17:15
trevoradriaanse commented 1 hour ago

I'm seeing

(pyserini) [tadriaanse@test1 scratch]$ java -cp anserini-0.38.0-fatjar.jar io.anserini.search.SearchCollection   -index msmarco-v1-passage.splade-pp-ed   -topics msmarco-v1-passage.dev   -encoder SpladePlusPlusEnsembleDistil   -output run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt   -impact -pretokenized
Index file already exists! Skip downloading.
Index folder already exists!
2024-11-26 07:12:11,789 INFO  [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============
2024-11-26 07:12:11,795 INFO  [main] search.SearchCollection (SearchCollection.java:1009) - Index: /home/tadriaanse/.cache/pyserini/indexes/lucene-inverted.msmarco-v1-passage.splade-pp-ed.20230524.a59610.2c008fc36131e27966a72292932358e6
Nov 26, 2024 7:12:11 AM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
2024-11-26 07:12:12,891 INFO  [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 4
2024-11-26 07:12:12,892 INFO  [main] search.SearchCollection (SearchCollection.java:1013) - Fields: []
2024-11-26 07:12:12,893 INFO  [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: false
2024-11-26 07:12:12,894 INFO  [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 1000
2024-11-26 07:12:12,894 INFO  [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null
2024-11-26 07:12:13,094 INFO  [main] search.SearchCollection (SearchCollection.java:1345) - ============ Launching Search Threads ============
2024-11-26 07:12:13,095 INFO  [main] search.SearchCollection (SearchCollection.java:1346) - runtag: Anserini
Exception in thread "main" java.lang.RuntimeException
        at io.anserini.search.SearchCollection$SearcherThread.<init>(SearchCollection.java:835)
        at io.anserini.search.SearchCollection.run(SearchCollection.java:1364)
        at io.anserini.search.SearchCollection.main(SearchCollection.java:1412)

Comparing the two traces, it looks like I also get through the calls to SearchCollection.java and then ultimately crash before starting to process queries.