castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.57k stars 349 forks source link

Optimizations when building a dense index #1910

Open ftvalentini opened 1 month ago

ftvalentini commented 1 month ago

In the main class for building dense indexes:

https://github.com/castorini/pyserini/blob/b7e1da305dd31b195244d49321087505996260c6/pyserini/encode/_auto.py#L24

the arg fp16 does not seem to be used anywhere (it is kwargs in this line: https://github.com/castorini/pyserini/blob/b7e1da305dd31b195244d49321087505996260c6/pyserini/encode/_auto.py#L38). It could be included in the __init__ as:

self.model = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16)

Moreover, the encode() method coulde use inference_mode(), like:

with torch.inference_mode():
    outputs = self.model(**inputs)

which would significantly reduce the memory footprint of inference.