How to set model precision when evaluating and what's the default precision? Such as "fp16" or "loat_in_8bit"

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Other

1.88k stars 143 forks source link

The default dtype is auto (this may vary depending on the specific model; you can check the model code).

You have the option to pass the dtype argument to model_args when executing evaluation (float32 or float16; these are torch_dtype).

Alternatively, you can directly modify the code within the model_name.py file to utilize load_in_8bit or load_in_4bit, similar to this or this.

Simply add load_in_8bit=True:

self.model = AutoModelForCausalLM.from_pretrained(...,...,..., load_in_8bit=True,...)

quantized models from hf don't need extra args like load_in_4bit

EvolvingLMMs-Lab / lmms-eval