Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
20.45k stars 1.03k forks source link

Bug: Cannot use llama 3.2 vision #615

Open GeorgelPreput opened 4 days ago

GeorgelPreput commented 4 days ago

Contact Details

georgelpreput@mailbox.org

What happened?

Tried to adapt the command running Llava to work with Llama 3.2 (which supposedly also has vision), but couldn't get it to work. From the docs:

./llava-v1.5-7b-q4.llamafile --temp 0.2 --image lemurs.jpg -e -p '### User: What do you see?\n### Assistant:'

Which I turned into:

./Llama-3.2-3B-Instruct.Q6_K.llamafile --image ./images/1.jpg --mmproj -e -p '### User: What do you see?\n### Assistant:'

Initially tried without the --mmproj parameter, but that asked specifically for that parameter to exist:

note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
Log start
./Llama-3.2-3B-Instruct.Q6_K.llamafile: fatal error: --mmproj must also be passed when an --image is specified in cli mode

Running without -e also doesn't work.

Version

llamafile v0.8.16

What operating system are you seeing the problem on?

Linux

Relevant log output

❯ ./Llama-3.2-3B-Instruct.Q6_K.llamafile --image ./images/1.jpg --mmproj -e -p '### User: What do you see?\n### Assistant:'
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
Log start
Log start
Cmd: ./Llama-3.2-3B-Instruct.Q6_K.llamafile -m Llama-3.2-3B-Instruct.Q6_K.gguf --image ./images/1.jpg --mmproj -e -p "### User: What do you see?\n### Assistant:"
llama_model_loader: loaded meta data with 28 key-value pairs and 255 tensors from Llama-3.2-3B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                         general.size_label str              = 3.2B
llama_model_loader: - kv   3:                            general.license str              = llama3.2
llama_model_loader: - kv   4:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   5:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   6:                          llama.block_count u32              = 28
llama_model_loader: - kv   7:                       llama.context_length u32              = 131072
llama_model_loader: - kv   8:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv   9:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  10:                 llama.attention.head_count u32              = 24
llama_model_loader: - kv  11:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  12:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  13:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  14:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  15:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  16:                          general.file_type u32              = 18
llama_model_loader: - kv  17:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  18:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  19:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  20:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  21:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  22:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  23:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  24:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  25:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  26:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  27:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   58 tensors
llama_model_loader: - type q6_K:  197 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_layer          = 28
llm_load_print_meta: n_head           = 24
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 3
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q6_K
llm_load_print_meta: model params     = 3.21 B
llm_load_print_meta: model size       = 2.45 GiB (6.56 BPW) 
llm_load_print_meta: general.name     = n/a
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size =    0.14 MiB
llm_load_tensors:        CPU buffer size =  2513.90 MiB
.................................................................................
/home/georgelpreput/Downloads/Llama-3.2-3B-Instruct.Q6_K.llamafile: error: no -e file found in zip archive
libc++abi: terminating due to uncaught exception of type std::runtime_error: clip_model_load: failed to load CLIP model from -e. Does this file exist?

error: Uncaught SIGABRT (SI_TKILL) at 0x3e80000495c on laptop pid 18780 tid 18780
  ./Llama-3.2-3B-Instruct.Q6_K.llamafile
  No such file or directory
  Linux Cosmopolitan 3.9.6 MODE=x86_64; #1 SMP PREEMPT_DYNAMIC Fri, 01 Nov 2024 03:30:41 +0000 laptop 6.11.6-arch1-1

RAX 0000000000000000 RBX 0000000000000006 RDI 000000000000495c
RCX 00000000008fe661 RDX 0000000000000000 RSI 0000000000000006
RBP 00007fffefe8cf70 RSP 00007fffefe8cf70 RIP 00000000008fe661
 R8 0000000000000000  R9 0000000000000000 R10 00000000008fe661
R11 0000000000000296 R12 0000000000a6a7a8 R13 0000000000901410
R14 0000000000a6a790 R15 00007fffefe8e681
TLS 0000000000b2ae40

XMM0  00000000000000000000000000000000 XMM8  00000000000000000000000000000000
XMM1  00000000000000000000000000000000 XMM9  ffffffffffffffffffffffffffffffff
XMM2  000000000000000000007fffefe8c090 XMM10 ffffffffffffffffffffffffffffffff
XMM3  206f742064656c696166203a64616f6c XMM11 00000000000000000000000000000000
XMM4  696620736968742073656f44202e652d XMM12 00000000000000000000000000000000
XMM5  206d6f7266206c65646f6d2050494c43 XMM13 ffffffffffffffffffffffffffffffff
XMM6  2064616f6c206f742064656c69616620 XMM14 00000000000000000000000000000000
XMM7  3a64616f6c5f6c65646f6d5f70696c63 XMM15 00000000000000000000000000000000

cosmoaddr2line /home/georgelpreput/Downloads/Llama-3.2-3B-Instruct.Q6_K.llamafile 8fe661 8e96eb 407811 9012ba 9013be 8c2f56 8c2b82 4a2f35 4aee37 4ae267 41cdca 404204 4015f4

note: pledge() sandboxing makes backtraces not as good
7fffefe89d80 8fe661 systemfive_linux+31
7fffefe8cf70 8e96eb raise+107
7fffefe8cf90 407811 abort+40
7fffefe8cfb0 9012ba abort_message+202
7fffefe8d0a0 9013be demangling_terminate_handler()+206
7fffefe8d0e0 8c2f56 std::__terminate(void (*)())+6
7fffefe8d0f0 8c2b82 __cxa_throw+82
7fffefe8d120 4a2f35 clip_model_load+22709
7fffefe8d400 4aee37 llava_init_context(gpt_params*, llama_model*)+167
7fffefe8d530 4ae267 llava_cli+2935
7fffefe8ded0 41cdca main+1370
7fffefe8ef50 404204 cosmo+68
7fffefe8ef60 4015f4 _start+125

000000400000-000000a811e0 r-x-- 6660kb
000000a82000-0000031de000 rw--- 39mb
0006fe000000-0006fe001000 rw-pa 4096b
7ac2cb000000-7ac2cf800000 rw-pa 72mb
7ac2cf9f0000-7ac36d350080 r--s- 2521mb
7ac36d400000-7ac36d800000 rw-pa 4096kb
7ac36d880000-7ac36d881000 ---pa 4096b
7ac36d881000-7ac36d894000 rw-pa 76kb
7ac36d894000-7ac36d895000 ---pa 4096b
7ac36d895000-7ac36d8a8000 rw-pa 76kb
7ac36d8a8000-7ac36d8a9000 ---pa 4096b
7ac36d8a9000-7ac36d8bc000 rw-pa 76kb
7ac36d8bc000-7ac36d8bd000 ---pa 4096b
7ac36d8bd000-7ac36d9f6000 rw-pa 1252kb
7ac36d9f6000-7ac4149738a1 r--s- 2671mb
# 5'583'839'232 bytes in 18 mappings

./Llama-3.2-3B-Instruct.Q6_K.llamafile -m Llama-3.2-3B-Instruct.Q6_K.gguf --image ./images/1.jpg --mmproj -e -p ### User: What do you see?\n### Assistant:
wirthual commented 2 days ago

Hi,

llamafile builds on top of the llama.cpp project.

Currently it seems llama.cpp does not have support for llama3.2 vision models.

See here

And also here

yurawagner commented 11 hours ago

If I got it correctly llama3.2 1B and 3B do NOT have the vision-feature. image image