Closed WeiyeeGoh closed 4 years ago
Thanks for posting. I'm not really sure what's wrong here. Maybe it's a problem with the threading? There could be a race condition.
Can you try making the following changes to this file: https://github.com/Georgetown-IR-Lab/OpenNIR/blob/master/onir/datasets/index_backed.py#L123
def _init_indices_parallel(self, indices, doc_iter, force):
needs_docs = []
for index in indices:
if force or not index.built():
needs_docs.append(index)
if needs_docs and self._confirm_dua():
doc_iter = list(doc_iter)
for idx in needs_docs:
idx.build(doc_iter)
(Note: I have not tried this code.)
The change means you'll load the entire dataset into memory, but it might help work around this issue.
Maybe migrating to pyserini would fix this?
It seems like I still run into the same issues where the IndexCollection class cannot be found. Here's the stack trace.
Uncaught exception
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/OpenNIR/onir/bin/pipeline.py", line 21, in <module>
main()
File "/home/ubuntu/OpenNIR/onir/bin/pipeline.py", line 17, in main
context['pipeline'].run()
File "/home/ubuntu/OpenNIR/onir/pipelines/default.py", line 49, in run
self.trainer.dataset.init(force=False)
File "/home/ubuntu/OpenNIR/onir/datasets/antique.py", line 63, in init
self._init_indices_parallel(idxs, self._init_iter_collection(), force)
File "/home/ubuntu/OpenNIR/onir/datasets/index_backed.py", line 132, in _init_indices_parallel
idx.build(doc_iter)
File "/home/ubuntu/OpenNIR/onir/indices/anserini.py", line 286, in build
index_args = J.A_IndexArgs()
File "/home/ubuntu/OpenNIR/onir/interfaces/java.py", line 45, in __getattr__
self.initialize()
File "/home/ubuntu/OpenNIR/onir/interfaces/java.py", line 64, in initialize
self._cache[key] = self._autoclass(path)
File "/home/ubuntu/.local/lib/python3.6/site-packages/jnius/reflect.py", line 208, in autoclass
c = find_javaclass(clsname)
File "jnius/jnius_export_func.pxi", line 28, in jnius.find_javaclass
jnius.JavaException: Class not found b'io/anserini/index/IndexCollection'
So I switched to using Java 11 and the problem went away. I believe this is because Anserini was upgraded and compiled in Java 11 so using an older version to handle the jar package would fail. Thanks!
Makes sense, good to know!
When running the command "scripts/pipeline.sh config/antique config/trivial", I ran into an issue where none of the java classes under io.anserini.index could be found by jnius. What's strange is that all the classes in org.apache.lucene were discovered fine. I checked bin/anserini-0.8.0-fatjar.jar and all the java classes under io.anserini.index were in their correct paths. Do you know what could be causing this issue?
I'm running this on Ubuntu 18.04, Java 1.8, and python 3.6.9.
Here's the stack trace for the error