Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
20.08k stars 1.01k forks source link

GGML_ASSERT(ggml-metal.m:1645): false #42

Closed jpillora closed 10 months ago

jpillora commented 11 months ago

I have Xcode installed, though getting this on my M1, 8GB:

llm_load_tensors: VRAM used: 0.00 MB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  400.00 MB
llama_build_graph: non-view tensors processed: 924/924
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1
ggml_metal_init: picking default device: Apple M1
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/var/folders/_0/00tff1yd64lf3vxzhmldx8100000gn/T/.llamafile/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  5461.34 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 81.63 MB
llama_new_context_with_model: max tensor size =   128.18 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  4096.00 MB, offs =            0
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3533.78 MB, offs =   4160536576, ( 7630.41 /  5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   400.02 MB, ( 8030.42 /  5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    75.02 MB, ( 8105.44 /  5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 2 failed with status 5
GGML_ASSERT: /var/folders/_0/00tff1yd64lf3vxzhmldx8100000gn/T//.llamafile/ggml-metal.m:1645: false
zsh: abort      sh -c ./wizardcoder13b --host 0.0.0.0

Maybe not enough RAM?

jart commented 11 months ago

The most I'm able to glean here is that 5 means MTLCommandBufferStatus.error.

Do you know if the issue exists for you in the upstream llama.cpp project?

jpillora commented 11 months ago

Just tested and the llava model works

I’ll run llamacpp and report back

Great project btw!

On Sun, 3 Dec 2023 at 2:05 PM Justine Tunney @.***> wrote:

The most I'm able to glean here is that 5 means MTLCommandBufferStatus.error https://developer.apple.com/documentation/metal/mtlcommandbufferstatus/error .

Do you know if the issue exists for you in the upstream llama.cpp project?

— Reply to this email directly, view it on GitHub https://github.com/Mozilla-Ocho/llamafile/issues/42#issuecomment-1837319079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE2X42X63YEN2AW3SHERDTYHPT67AVCNFSM6AAAAABAENIERKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGMYTSMBXHE . You are receiving this because you authored the thread.Message ID: @.***>

jart commented 10 months ago

Closing due to inactivity. If you end up reproducing the issue upstream, please file an issue there so it can be fixed, and we'll eventually cherry-pick it into this project. If the bug doesn't exist upstream then let us know, and I'll reopen this.