ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.94k stars 9.31k forks source link

FR: Phi-3-vision-128k-instruct implementation #7444

Open mirek190 opened 3 months ago

mirek190 commented 3 months ago

That model is insane for its size ....

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

simsi-andy commented 3 months ago

Is it natively supported once someone converts it to gguf?

4onen commented 3 months ago

Is it natively supported once someone converts it to gguf?

Someone has to write the code to run such a model into llama.cpp. Then it would be a model you could convert to gguf. Until then, no.

mirek190 commented 3 months ago

I'm waiting who will do that patiently ...šŸ˜­

HaoHoo commented 3 months ago

I've tried to convert the phi-3-vision-128k-instruct HF model to the GGUF model. But it looks like the current version llama.cpp does not support the vision model (model.vision_embed_tokens, etc.) in phi-3v. After I add "Phi3VForCausalLM" into the convert-hf-to-gguf.py just copy from "Phi3ForCausalLM", the running result looks like below:

... INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Setting special token type bos to 1 INFO:gguf.vocab:Setting special token type eos to 32000 INFO:gguf.vocab:Setting special token type unk to 0 INFO:gguf.vocab:Setting special token type pad to 32000 INFO:gguf.vocab:Setting add_bos_token to True INFO:gguf.vocab:Setting add_eos_token to False INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{{'<|' + message['role'] + '|>' + ' ' + message['content'] + '<|end|> ' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|> ' -}}{% endif %} INFO:hf-to-gguf:Exporting model to 'converted.bin' INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json' INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors' INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {3072, 32064} ... ... File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 330, in write self.write_tensors() File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 266, in write_tensors for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)): File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 233, in modify_tensors return [(self.map_tensor_name(name), data_torch)] File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 184, in map_tensor_name raise ValueError(f"Can not map tensor {name!r}") ValueError: Can not map tensor 'model.vision_embed_tokens.glb_GN'

The tensors' names like 'model.vision_embed_tokens.glb_GN' are not listed in the "TensorNameMap" of the tensor_mapping.py file. These additional models in the Phi-3v can be found here: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/tree/main?show_file_info=model.safetensors.index.json

Is that possible to make llama.cpp support multimodel like llava and Phi-3v?

DenisSergeevitch commented 3 months ago

The model is very good for its size for the OCR task, looking forward to use it in GGUF format

HaoHoo commented 3 months ago

Hi @ggerganov, the Phi-3 vision is similar to llava, combined with Phi-3 and CLIP-ViT-Large-patch14-336 models. Is possible to support converting it from HF to GGUF?

anuran-roy commented 3 months ago

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

HaoHoo commented 3 months ago

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above:

ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

anuran-roy commented 3 months ago

Any update on the convert-hf-to-gguf issue on the Phi3-vision-small-128k model? Seems to be giving the same error as above: ERROR:hf-to-gguf:Model Phi3VForCausalLM is not supported

You can copy "Phi3ForCausalLM" section and add as "Phi3VForCausalLM" in this python file. But Phi3-vision-128k-instruct includes Phi3 and Clip model. Phi3 can be detected and converted, but Clip model can't be converted via convert-hf-to-gguf.py code. It prompt the tensor mapping fail.

I did exactly that, as mentioned in the messages above in this issue. And got the exact same problem. Any sort of workarounds for this - if we can somehow decouple them or something?

farris commented 3 months ago

You can use examples/llava/llava-surgery-v2.py to separate out clip. I was able to modify it to do so successfully. I'm a bit stuck on the rest... the easiest way to do this imo is to modify the code under LLAVA/ to accept the phi3 base model and this hacked off clip encoder

farris commented 3 months ago

https://github.com/ggerganov/llama.cpp/pull/7705 šŸ‘ļø

BrainSlugs83 commented 2 months ago

Would it be possible to use a parameter in the GGUF header to tell it that the file contains two sets of tensor data?

I feel like for the typical user they will expect to use a single GGUF file.

Aisuko commented 1 month ago

bad bot

muzhig commented 1 month ago

sad but true

coder543 commented 3 weeks ago

New release of Phi-3.5-vision-instruct today: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

(As well as a 16x3.8B MoE and an updated version of the basic Phi-3.5-mini)

stygmate commented 2 weeks ago

+1 for support

Milor123 commented 2 weeks ago

@coder543 And it can be converted to GGUF? and use VISION model?

coder543 commented 2 weeks ago

@Milor123 Nopeā€¦ thatā€™s why this issue exists.

simsi-andy commented 2 weeks ago

Abetlen already did convert it and tries to create an experimental branch: https://huggingface.co/abetlen/Phi-3.5-vision-instruct-gguf

daboe01 commented 2 weeks ago

https://github.com/ggerganov/llama.cpp/pull/9209/

ayttop commented 1 week ago

code to use Phi-3.5-vision-instruct-gguf with image locally on llama cpp python????????????