Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices
Apache License 2.0
969 stars 65 forks source link

Not being able to run an inference using llama.cpp #54

Open Hardik-Choraria opened 2 months ago

Hardik-Choraria commented 2 months ago

I am using an MacBook Pro M2 and when doing /llama-llava-cli -m ../MobileVLM-1.7B/ggml-model-q4_k.gguf \ --mmproj ../MobileVLM-1.7B/mmproj-model-f16.gguf \ --image ../paella.jpg \ -p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: \nWho is the author of this book? Answer the question using a single word or phrase. ASSISTANT:" I am getting this error: ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB llama_kv_cache_init: Metal KV buffer size = 384.00 MiB llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB llama_new_context_with_model: CPU output buffer size = 0.12 MiB llama_new_context_with_model: Metal compute buffer size = 84.00 MiB llama_new_context_with_model: CPU compute buffer size = 8.01 MiB llama_new_context_with_model: graph nodes = 774 llama_new_context_with_model: graph splits = 2 ggml_metal_graph_compute_block_invoke: error: unsupported op 'HARDSWISH' GGML_ASSERT: ggml/src/ggml-metal.m:934: !"unsupported op" zsh: abort ./llama-llava-cli -m ../MobileVLM-1.7B/ggml-model-q4_k.gguf --mmproj --image