Optimizations when building a dense index

In the main class for building dense indexes:

the arg fp16 does not seem to be used anywhere (it is kwargs in this line: https://github.com/castorini/pyserini/blob/b7e1da305dd31b195244d49321087505996260c6/pyserini/encode/_auto.py#L38). It could be included in the __init__ as:

self.model = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16)

Moreover, the encode() method coulde use inference_mode(), like:

with torch.inference_mode():
    outputs = self.model(**inputs)

which would significantly reduce the memory footprint of inference.

castorini / pyserini