Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices
Apache License 2.0
1.04k stars 66 forks source link

llama.c转换mobileLVM-V2报错 #29

Open shinerdeng opened 8 months ago

shinerdeng commented 8 months ago

按这个教程https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/MobileVLM-README.md, 执行到python convert.py MobileVLM_V2-1.7B 报错: Traceback (most recent call last): File "/root/llama.cpp_MobileVLM/llama.cpp/convert.py", line 1483, in main() File "/root/llama.cpp_MobileVLM/llama.cpp/convert.py", line 1469, in main model = convert_model_names(model, params, args.skip_unknown) File "/root/llama.cpp_MobileVLM/llama.cpp/convert.py", line 1206, in convert_model_names raise Exception(f"Unexpected tensor name: {name}. Use --skip-unknown to ignore it (e.g. LLaVA)") Exception: Unexpected tensor name: model.vision_tower.vision_tower.vision_model.embeddings.class_embedding. Use --skip-unknown to ignore it (e.g. LLaVA)

YangYang-DLUT commented 8 months ago

26 The latest version of llama.cpp has been updated, check https://github.com/Meituan-AutoML/MobileVLM/issues/26#issuecomment-1960925584.

Please try this 😃 If any problem I will be here check it for you.

shinerdeng commented 8 months ago

26 The latest version of llama.cpp has been updated, check #26 (comment). Please try this 😃 If any problem I will be here check it for you.

是这个git吧https://github.com/XiaotaoChen/llama.cpp/tree/MobileVLM-PEG。 还是同样的错误,python convert.py /root/llama.cpp_MobileVLM/MobileVLM_V2-1.7B这行报错, 这是命令历史 python examples/llava/llava-surgery-v2.py -m /root/llama.cpp_MobileVLM/MobileVLM_V2-1.7B python ./examples/llava/convert-image-encoder-to-gguf.py -m ~/llama.cpp_MobileVLM/clip-vit-large-patch14-336 --llava-projector ~/llama.cpp_MobileVLM/MobileVLM_V2-1.7B/llava.projector --output-dir ~/llama.cpp_MobileVLM/MobileVLM_V2-1.7B/ --projector-type peg python convert.py /root/llama.cpp_MobileVLM/MobileVLM_V2-1.7B history

YangYang-DLUT commented 8 months ago

Looks like there are some conflicts. We are working on it. GGUF_format MobileVLM V2 1.7B is provided here: google drive

Change

llama.cpp/example/llava/clip.cpp
...
line 123 #define TN_MVLM_PROJ_PEG_MLP "mm.mlp.%d.%s"
line 124 #define TN_MVLM_PROJ_PEG "mm.peg.proj.%d.%s"
...
line 857             embeddings = ggml_mul(ctx0, embeddings, model.mm_peg_ls_zeta);
...
line 1236            vision_model.mm_peg_ls_zeta = get_tensor(new_clip->ctx_data, "mm.peg.ls.zeta");
...

to

...
#define TN_MVLM_PROJ_PEG_MLP "mm.mlp.mlp.%d.%s"
#define TN_MVLM_PROJ_PEG "mm.peg.peg.%d.%s"
...
            // embeddings = ggml_mul(ctx0, embeddings, model.mm_peg_ls_zeta);
...
            // vision_model.mm_peg_ls_zeta = get_tensor(new_clip->ctx_data, "mm.peg.ls.zeta");

than complie again, run llava-cli with GGUF models.

zoahmed-xyz commented 8 months ago

`With llava-cli and your branch of thellama.cpp` repo I get the error

ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   567.56 MiB, (  568.31 / 10922.67)
libc++abi: terminating due to uncaught exception of type std::runtime_error: clip_model_load: don't support projector with:  currently

I am using the gguf models from the Google Drive link you provided.

Full command used is:

./llava-cli -m /Users/<username>/vlm-realtime/alt-llama.cpp/models/mobile-vlm-v2/ggml-model-q4_k.gguf --mmproj /Users/<username>/vlm-realtime/alt-llama.cpp/models/mobile-vlm-v2/mmproj-model-f16.gguf --image /Users/<username>/scenery.jpg -c 4096

rezacopol commented 6 months ago

I'm getting the same error

lijianxing123 commented 5 months ago

为啥llama.cpp 的结果和 本项目运行的结果不一致呢 llama.cpp 结果: ./llava-cli -m /mnt/nas_data2/wb_space/MobileVLMV2/MobileVLM_V2-1.7B_bk/ggml-model-f32.gguf --mmproj /mnt/nas_data2/wb_space/MobileVLMV2/MobileVLM_V2-1.7B_bk/mmproj-model-f16.gguf --image /mnt/nas_data2/wb_space/MobileVLMV2/assets/samples/demo.jpg -p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER:please describe this images ASSISTANT:" --temp 0 --top-p 1 -c 4096 结果: The image is a digital art piece that captures the essence of history through its depiction. It features an illustration from "The Story Of World History" by Susan Wise Bauer, Revised Edition: Volume II - From Rome to Middle Ages (Volume 2)

本项目pytorch 运行: model_path = './MobileVLM_V2-1.7B' image_file = "assets/samples/demo.jpg" prompt_str = "please describe this images" args = type('Args', (), { "model_path": model_path,

"image_file": image_file ,

  "image_file": i,
  "prompt": prompt_str,
  "conv_mode": "v1",
  "temperature": 0,
  "top_p": None,
  "num_beams": 1,
  "max_new_tokens": 512,
  "load_8bit": False,
  "load_4bit": False,

})()

inference_once(args)

结果: 🚀 MobileVLM_V2-1.7B: The image is a vivid depiction of the cover of a book titled "The Story of the World: History for the Classical Child, Vol. 2: The Middle Ages, Volume 2: The Fall of Rome to the Rise of the Normans (Revised Edition)". The cover art is a captivating illustration of a knight on horseback, armed with a bow and arrow, poised for battle. The title of the book, "The Story of the World: History for the Classical Child, Vol. 2: The Middle Ages, Volume 2: The Fall of Rome to the Rise of the Normans", is prominently displayed in large, bold letters at the top of the cover. The author's name, Susan Wise Bauer, is also visible, indicating her authorship of the book. The overall design of the cover suggests a theme of adventure and exploration, fitting for a book about history.

lijianxing123 commented 5 months ago

@YangYang-DLUT