Open ApoorvFrontera opened 1 month ago
According to this document BitsAndBytesConfig, all of the linear layers will be replaced by FP4/NF4
layers if setting load_4bit
. Therefore, it reports size mismatch error.
A temporary solution is to initialize a unquantified model, load projector weights, and save the whole model weights. The saved weights can be loaded successfully with load_4bit=True
.
model, processor, tokenizer = model_init('DAMO-NLP-SG/VideoLLaMA2-7B-Base')
model.config.tune_mm_mlp_adapter = False
model.save_pretrained('VideoLLaMA2-7B-full')
tokenizer.save_pretrained('VideoLLaMA2-7B-full')
model, processor, tokenizer = model_init('VideoLLaMA2-7B-full', load_4bit=True)
Hi VideoLLaMA Team,
I am facing issues while loading all the base models in 4-bit precision. The following lines try to load the
mm_projector_weights
which are stored in 16-bit precision into a model that requires the weights in 4bit leading to errors:Code used for loading the models for inference
Problematic part of the Code: Lines: https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/main/videollama2/model/__init__.py#L171-L172
Error:
How can we use the 16-bit stored weights of the
mm_projector_weights
in 4-bit models?