Open mirek190 opened 3 months ago
Is it natively supported once someone converts it to gguf?
Is it natively supported once someone converts it to gguf?
Someone has to write the code to run such a model into llama.cpp. Then it would be a model you could convert to gguf. Until then, no.
I'm waiting who will do that patiently ...š
I've tried to convert the phi-3-vision-128k-instruct HF model to the GGUF model. But it looks like the current version llama.cpp does not support the vision model (model.vision_embed_tokens, etc.) in phi-3v. After I add "Phi3VForCausalLM" into the convert-hf-to-gguf.py just copy from "Phi3ForCausalLM", the running result looks like below:
... INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Setting special token type bos to 1 INFO:gguf.vocab:Setting special token type eos to 32000 INFO:gguf.vocab:Setting special token type unk to 0 INFO:gguf.vocab:Setting special token type pad to 32000 INFO:gguf.vocab:Setting add_bos_token to True INFO:gguf.vocab:Setting add_eos_token to False INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{{'<|' + message['role'] + '|>' + ' ' + message['content'] + '<|end|> ' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|> ' -}}{% endif %} INFO:hf-to-gguf:Exporting model to 'converted.bin' INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json' INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors' INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {3072, 32064} ... ... File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 330, in write self.write_tensors() File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 266, in write_tensors for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)): File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 233, in modify_tensors return [(self.map_tensor_name(name), data_torch)] File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 184, in map_tensor_name raise ValueError(f"Can not map tensor {name!r}") ValueError: Can not map tensor 'model.vision_embed_tokens.glb_GN'
The tensors' names like 'model.vision_embed_tokens.glb_GN' are not listed in the "TensorNameMap" of the tensor_mapping.py file. These additional models in the Phi-3v can be found here: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/tree/main?show_file_info=model.safetensors.index.json
Is that possible to make llama.cpp support multimodel like llava and Phi-3v?
The model is very good for its size for the OCR task, looking forward to use it in GGUF format
Hi @ggerganov, the Phi-3 vision is similar to llava, combined with Phi-3 and CLIP-ViT-Large-patch14-336 models. Is possible to support converting it from HF to GGUF?
Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:
ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported
Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:
ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported
You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.
Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:
ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported
You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.
I did exactly that, as mentioned in the messages above in this issue. And got the exact same problem. Any sort of workarounds for this - if we can somehow decouple them or something?
You can use examples/llava/llava-surgery-v2.py
to separate out clip. I was able to modify it to do so successfully. I'm a bit stuck on the rest... the easiest way to do this imo is to modify the code under LLAVA/ to accept the phi3 base model and this hacked off clip encoder
Would it be possible to use a parameter in the GGUF header to tell it that the file contains two sets of tensor data?
I feel like for the typical user they will expect to use a single GGUF file.
bad bot
sad but true
New release of Phi-3.5-vision-instruct today: https://huggingface.co/microsoft/Phi-3.5-vision-instruct
(As well as a 16x3.8B MoE and an updated version of the basic Phi-3.5-mini)
+1 for support
@coder543 And it can be converted to GGUF? and use VISION model?
@Milor123 Nopeā¦ thatās why this issue exists.
Abetlen already did convert it and tries to create an experimental branch: https://huggingface.co/abetlen/Phi-3.5-vision-instruct-gguf
code to use Phi-3.5-vision-instruct-gguf with image locally on llama cpp python????????????
That model is insane for its size ....
https://huggingface.co/microsoft/Phi-3-vision-128k-instruct