I would like to add a small collection of documents to a large prebuilt index. i tried to use the function add_doc_raw() of LuceneIndexer but i got an exception:
jnius.JavaException: JVM exception occurred: cannot change field "id" from doc values type=SORTED to inconsistent doc values type=BINARY java.lang.IllegalArgumentException
There is another option to make it?
Thanks!
full code
from pyserini.index.lucene import IndexReader, LuceneIndexer
import tarfile
from urllib.request import urlretrieve
r = random.randint(0, 10000000)
collection_url = 'https://github.com/castorini/anserini-data/raw/master/CACM/lucene-index.cacm.tar.gz'
tarball_name = 'lucene-index.cacm-{}.tar.gz'.format(r)
index_dir = 'index{}/'.format(r)
_,_ = urlretrieve(collection_url, tarball_name)
tarball = tarfile.open(tarball_name)
tarball.extractall(index_dir)
tarball.close()
searcher = SimpleSearcher(f'{index_dir}lucene-index.cacm')
index_utils = IndexReader(f'{index_dir}lucene-index.cacm')
lucene_index = LuceneIndexer(f'{index_dir}lucene-index.cacm', append=True)
x = '{ "id": "doc99910294", "contents": "this is the content of bob"}'
index_utils.stats()
lucene_index.add_doc_raw(x)
lucene_index.close()
Hi,
I would like to add a small collection of documents to a large prebuilt index. i tried to use the function add_doc_raw() of LuceneIndexer but i got an exception:
jnius.JavaException: JVM exception occurred: cannot change field "id" from doc values type=SORTED to inconsistent doc values type=BINARY java.lang.IllegalArgumentException
There is another option to make it?
Thanks!
full code