Open IceTea42 opened 1 month ago
You can quantize our model at run time using bitsandbyte. Please refer to the steps below.
pip install bitsandbyte
vlmeval/vlm/hpt.pt
file as follows.
from transformers import BitsAndBytesConfig
quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16,)
llm = AutoModelForCausalLM.from_pretrained(global_model_path, quantization_config=quant_config, subfolder = 'llm', trust_remote_code=True, use_safetensors=True, torch_dtype=torch_dtype, device_map='cuda') self.llm = llm
3. Then you can proceed to run the demo
python demo/demo.py --image_path demo/einstein.jpg --text 'Question: What is unusual about this image?\nAnswer:' --model hpt-air-demo-local
This is the output on my end.
The unusual aspect of this image is the depiction of a famous scientist, such as Albert Einstein, holding a cell phone. It is not common to see a renowned scientist using a cell phone, as they are often associated with more traditional communication methods or research. The painting or drawing of Einstein holding a cell phone is an interesting and unexpected representation of the scientist's daily life or activities.
Hi, thank you for this work. How to quantize it to use int8? Any comments are appreciated.