Open ftvalentini opened 1 month ago
Would be nice to add support for reading jsonl.gz files when encoding a corpus with a dense encoder with python -m pyserini.encode, in:
python -m pyserini.encode
https://github.com/castorini/pyserini/blob/b7e1da305dd31b195244d49321087505996260c6/pyserini/encode/_base.py#L133
Maybe with:
#... open_handle = gzip.open if filename.endswith(".gz") else open with open_handle(filename) as f: #...
In this way both pyserini.index.lucene and pyserini.encode accept jsonl.gz files as input.
pyserini.index.lucene
pyserini.encode
Would be nice to add support for reading jsonl.gz files when encoding a corpus with a dense encoder with
python -m pyserini.encode
, in:https://github.com/castorini/pyserini/blob/b7e1da305dd31b195244d49321087505996260c6/pyserini/encode/_base.py#L133
Maybe with:
In this way both
pyserini.index.lucene
andpyserini.encode
accept jsonl.gz files as input.