Open transcend-0 opened 1 month ago
The default dtype
is auto
(this may vary depending on the specific model; you can check the model code).
You have the option to pass the dtype
argument to model_args
when executing evaluation (float32
or float16
; these are torch_dtype).
Alternatively, you can directly modify the code within the model_name.py
file to utilize load_in_8bit
or load_in_4bit
, similar to this or this.
Simply add load_in_8bit=True
:
self.model = AutoModelForCausalLM.from_pretrained(...,...,..., load_in_8bit=True,...)
quantized models from hf don't need extra args like load_in_4bit
How to set model precision when evaluating? Such as "fp16" or "loat_in_8bit" And what's the default precision? It seems to be "fp16"? Because around 14GB GPU memory is occupied when LLaVA-1.5-7B is loaded in the evaluation process.