You can quantize our model at run time using bitsandbyte. Please refer to the steps below.

Install bitsandbyte
```
pip install bitsandbyte
```

Modify the vlmeval/vlm/hpt.pt file as follows.


from transformers import BitsAndBytesConfig

this part should go at line 53

quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16,)

llm = AutoModelForCausalLM.from_pretrained(global_model_path, quantization_config=quant_config, subfolder = 'llm', trust_remote_code=True, use_safetensors=True, torch_dtype=torch_dtype, device_map='cuda') self.llm = llm

3.  Then you can proceed to run the demo

python demo/demo.py --image_path demo/einstein.jpg --text 'Question: What is unusual about this image?\nAnswer:' --model hpt-air-demo-local

This is the output on my end.

The unusual aspect of this image is the depiction of a famous scientist, such as Albert Einstein, holding a cell phone. It is not common to see a renowned scientist using a cell phone, as they are often associated with more traditional communication methods or research. The painting or drawing of Einstein holding a cell phone is an interesting and unexpected representation of the scientist's daily life or activities.

HyperGAI / HPT

Quantization #9

this part should go at line 53