castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.03k stars 457 forks source link

Simplify pyjnius configuration - jar import #678

Closed lintool closed 5 years ago

lintool commented 5 years ago

Currently, we need to do something like this:

import jnius_config
jnius_config.set_classpath("target/anserini-0.4.1-SNAPSHOT-fatjar.jar")
...

Our scripts have the jar hardcoded... is there some auto-config magic we can do?

emmileaf commented 5 years ago

Perhaps we can add a setup script like src/main/python/pyjnius_setup.py that searches for the latest jar file? (I'll send a PR after the Lucene 8 merge if this idea makes sense)

import os
import glob
import jnius_config

def configure_classpath(anserini_root="."):    
    paths = glob.glob(os.path.join(anserini_root, 'target', 'anserini-*-fatjar.jar'))
    latest = max(paths, key=os.path.getctime)
    jnius_config.set_classpath(latest)

Then, scripts within Anserini that use pyjnius can begin with:

import sys
sys.path += ['src/main/python']

from pyjnius_setup import configure_classpath
configure_classpath()
...

Scripts outside Anserini that use pyjnius can optionally specify path to Anserini root dir:

anserini_root = {path/to/anserini}

import os, sys
sys.path += [os.path.join(anserini_root, 'src/main/python')]

from pyjnius_setup import configure_classpath
configure_classpath(anserini_root)
...