Closed Tooba-ts1700550 closed 3 years ago
This is happening somewhere inside Anserini. Is this on Colab? Can you verify that Java version 11 is installed?
Yes this is in Colab, and I checked the java version, it is 11.0.10. The program continued running in other threads, after these errors. However, I am not sure why, maybe colab reaches its limit, the program just stops randomly during the search phase on msmarcopsg as it takes too long to search through it. This is why I'm trying to use the TPU.
Yeah, I would guess that you're hitting some kind of limit.
The search phase takes a long time on that branch, which is one of the reasons it isn't merged yet. You could avoid this phase entirely by using an existing BM25 run file instead of doing the search yourself. See here for example: https://github.com/capreolus-ir/capreolus/blob/master/capreolus/searcher/anserini.py#L270
I am getting an error when I run this using the existing BM25
!capreolus rerank.traineval with \
reranker.trainer.tpuname="COLAB" reranker.trainer.tpuzone="COLAB" reranker.trainer.storage="gs://capreolus-bucket/cap-results/" \
rank.searcher.index.stemmer=porter benchmark.name=msmarcopsg \
rank.searcher.name=bm25staticrob04yang19 \
rank.optimize=recall_1000 reranker.name=TFKNRM reranker.trainer.niters=2 optimize=P_20
Error: profane.exceptions.InvalidConfigError: received unknown config key: index
I think it cannot be used on msmarcopsg, since in the description it says benchmark: name = robust04.yang19, but I get the same error with robust04 as well.
Thank you very much for your help.
In this case you would remove rank.searcher.index.stemmer=porter
, since the static searchers use a results file directly and don't need an index.
However, you're right that rank.searcher.name=bm25staticrob04yang19
is only compatible with robust04. To do it this way with msmarco, you would need to create a new static searcher yourself with a file containing first-stage retrieval results from msmarco. (For example, BM25 results on msmarco train, dev, and test.)
I removed the stemmer, still I get this error:
2021-04-14 11:28:30,658 - INFO - capreolus.task.rerank.train - Time to rank.search: 0.1716005802154541
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/capreolus/run.py", line 108, in <module>
task_entry_function()
File "/usr/local/lib/python3.7/dist-packages/capreolus/task/rerank.py", line 37, in traineval
self.train()
File "/usr/local/lib/python3.7/dist-packages/capreolus/task/rerank.py", line 48, in train
rank_results = self.rank.evaluate()
File "/usr/local/lib/python3.7/dist-packages/capreolus/task/rank.py", line 55, in evaluate
self.get_results_path(), self.benchmark, primary_metric=self.config["optimize"], metrics=metrics
File "/usr/local/lib/python3.7/dist-packages/capreolus/evaluator.py", line 155, in search_best_run
dev_qrels = {qid: benchmark.qrels[qid] for qid in benchmark.non_nn_dev[fold_name]}
File "/usr/local/lib/python3.7/dist-packages/capreolus/evaluator.py", line 155, in <dictcomp>
dev_qrels = {qid: benchmark.qrels[qid] for qid in benchmark.non_nn_dev[fold_name]}
KeyError: '672'
How can I create a new static searcher? Is it only possible to create it using Capreolus or can it be created using some other code like Anserini? And where can I find the file for bm25staticrob04yang19 ? (for my reference)
Right, that error is because the bm25staticrob04yang19 file is only for robust04. To create a new one, you just need to edit anserini.py with the path to the new run file. The run file can be generated using Anserini or any other tool that creates a TREC-format run file.
I'm running the reranker KNRM on msmarcopsg, Can someone help with this error:
Exception in thread "pool-2-thread-2" java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
Thank you.