Georgetown-IR-Lab / OpenNIR

An end-to-end neural ad-hoc ranking pipeline.
https://opennir.net
MIT License
150 stars 25 forks source link

jnius.JavaException: Class not found b'io/anserini/index/IndexCollection' #25

Closed WeiyeeGoh closed 4 years ago

WeiyeeGoh commented 4 years ago

When running the command "scripts/pipeline.sh config/antique config/trivial", I ran into an issue where none of the java classes under io.anserini.index could be found by jnius. What's strange is that all the classes in org.apache.lucene were discovered fine. I checked bin/anserini-0.8.0-fatjar.jar and all the java classes under io.anserini.index were in their correct paths. Do you know what could be causing this issue?

I'm running this on Ubuntu 18.04, Java 1.8, and python 3.6.9.

Here's the stack trace for the error

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/OpenNIR/onir/indices/anserini.py", line 286, in build
    index_args = J.A_IndexArgs()
  File "/home/ubuntu/OpenNIR/onir/interfaces/java.py", line 45, in __getattr__
    self.initialize()
  File "/home/ubuntu/OpenNIR/onir/interfaces/java.py", line 64, in initialize
    self._cache[key] = self._autoclass(path)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/jnius/reflect.py", line 208, in autoclass
    c = find_javaclass(clsname)
  File "jnius/jnius_export_func.pxi", line 28, in jnius.find_javaclass
jnius.JavaException: Class not found b'io/anserini/index/IndexCollection'
seanmacavaney commented 4 years ago

Thanks for posting. I'm not really sure what's wrong here. Maybe it's a problem with the threading? There could be a race condition.

Can you try making the following changes to this file: https://github.com/Georgetown-IR-Lab/OpenNIR/blob/master/onir/datasets/index_backed.py#L123

    def _init_indices_parallel(self, indices, doc_iter, force):
        needs_docs = []
        for index in indices:
            if force or not index.built():
                needs_docs.append(index)

        if needs_docs and self._confirm_dua():
            doc_iter = list(doc_iter)
            for idx in needs_docs:
                idx.build(doc_iter)

(Note: I have not tried this code.)

The change means you'll load the entire dataset into memory, but it might help work around this issue.

Maybe migrating to pyserini would fix this?

WeiyeeGoh commented 4 years ago

It seems like I still run into the same issues where the IndexCollection class cannot be found. Here's the stack trace.

 Uncaught exception
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/OpenNIR/onir/bin/pipeline.py", line 21, in <module>
    main()
  File "/home/ubuntu/OpenNIR/onir/bin/pipeline.py", line 17, in main
    context['pipeline'].run()
  File "/home/ubuntu/OpenNIR/onir/pipelines/default.py", line 49, in run
    self.trainer.dataset.init(force=False)
  File "/home/ubuntu/OpenNIR/onir/datasets/antique.py", line 63, in init
    self._init_indices_parallel(idxs, self._init_iter_collection(), force)
  File "/home/ubuntu/OpenNIR/onir/datasets/index_backed.py", line 132, in _init_indices_parallel
    idx.build(doc_iter)
  File "/home/ubuntu/OpenNIR/onir/indices/anserini.py", line 286, in build
    index_args = J.A_IndexArgs()
  File "/home/ubuntu/OpenNIR/onir/interfaces/java.py", line 45, in __getattr__
    self.initialize()
  File "/home/ubuntu/OpenNIR/onir/interfaces/java.py", line 64, in initialize
    self._cache[key] = self._autoclass(path)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/jnius/reflect.py", line 208, in autoclass
    c = find_javaclass(clsname)
  File "jnius/jnius_export_func.pxi", line 28, in jnius.find_javaclass
jnius.JavaException: Class not found b'io/anserini/index/IndexCollection'
WeiyeeGoh commented 4 years ago

So I switched to using Java 11 and the problem went away. I believe this is because Anserini was upgraded and compiled in Java 11 so using an older version to handle the jar package would fail. Thanks!

seanmacavaney commented 4 years ago

Makes sense, good to know!