Open Ysf101 opened 2 weeks ago
it seems the issue related to memory handling, it happens when i give it a bit of large prompt
Update i switched to using CPU and now models are initializing successfully but when i give a bit of a complex prompt i get the error: Process 836 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000000000000
error: memory read failed for 0x0
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x0000000000000000
frame #1: 0x00000001088c011d libggml_llama.dylib`ggml_backend_blas_graph_compute(ggml_backend*, ggml_cgraph*) + 3453
frame #2: 0x0000000108886d67 libggml_llama.dylib`ggml_backend_sched_graph_compute_async + 1447
frame #3: 0x00000001086cff11 libllama.dylib`llama_decode + 2609
frame #4: 0x0000000105289972 libffi.8.dylib`ffi_call_unix64 + 82
frame #5: 0x000000010528916b libffi.8.dylib`ffi_call_int + 827
frame #6: 0x0000000105288d6b libffi.8.dylib`ffi_call + 219
frame #7: 0x000000010635848e _ctypes.cpython-312-darwin.so`_ctypes_callproc + 638
frame #8: 0x0000000106352437 _ctypes.cpython-312-darwin.so`PyCFuncPtr_call + 279
frame #9: 0x0000000100206ee8 python`_PyEval_EvalFrameDefault + 265704
frame #10: 0x000000010007eb8e python`_PyObject_Call_Prepend + 334
frame #11: 0x00000001001174f9 python`slot_tp_call + 105
frame #12: 0x000000010020c216 python`_PyEval_EvalFrameDefault + 286998
frame #13: 0x00000001001c369d python`PyEval_EvalCode + 253
frame #14: 0x0000000100290c50 python`run_mod + 272
frame #15: 0x0000000100290a13 python`pyrun_file + 147
frame #16: 0x00000001002903fe python`_PyRun_SimpleFileObject + 302
frame #17: 0x000000010028fd02 python`_PyRun_AnyFileObject + 66
frame #18: 0x00000001002bdc3c python`pymain_run_file_obj + 204
frame #19: 0x00000001002bd6f9 python`pymain_run_file + 89
frame #20: 0x00000001002bcd40 python`Py_RunMain + 1376
frame #21: 0x00000001002bdf79 python`pymain_main + 505
frame #22: 0x0000000100001279 python`main + 57
frame #23: 0x00007ff811cac310 dyld`start + 2432
(lldb) ^D
Hi @Ysf101 in llava architecture, one image take 729+ tokens, and with a long prompt, it is easy to have OOM error on mac laptop. We have addressed this issue and plan to propose a new multiodal model architecture to support it.
Seems that similar issue is also reported in llama.cpp https://github.com/ggerganov/llama.cpp/issues/4880
Hi @zhiyuan8, Thank you for the information, Just to add to my initial bug report I've tried to run NexaAI with 5GB of RAM, then 12 GB of RAM and also tried to run it on my friend's laptop(we both run macOS in VMware in which we use NexaAI) and it produced the segmentation error on my friend laptop (same error i was having). The input does not need to be long like see in the below example:
(ysf) ysf@ysfs % nexa run Qwen2.5-0.5B-Instruct:q2_K
Model Qwen2.5-0.5B-Instruct:q2_K already exists at /Users/ysf/.cache/nexa/hub/official/Qwen2.5-0.5B-Instruct/q2_K.gguf
>>> hi
assistant: Hello! How can I help you today?
>>> hiiiii
assistant: Hello! How can I assist you today?
>>> hiiiiiiiiii
zsh: segmentation fault nexa run Qwen2.5-0.5B-Instruct:q2_K
Meanwhile a 4GB RAM Windows running NexaAI in VMware runs all models perfectly, so it's the same machine and everything except OS.
@zhiyuan8 an update, I upgraded to sequoia macOS and the issue was solved completely for text models but still in vision models.
(ysf) ysf@ysfs-Mac % nexa run nanollava
Model nanoLLaVA:model-fp16 already exists at /Users/ysf/.cache/nexa/hub/official/nanoLLaVA/model-fp16.gguf
Model nanoLLaVA:projector-fp16 already exists at /Users/ysf/.cache/nexa/hub/official/nanoLLaVA/projector-fp16.gguf
⠋ zsh: segmentation fault nexa run nanollava
(ysf) ysf@ysfs-Mac %
Issue Description
I'm encountering a segmentation fault when initializing
NexaVLMInference
using thenexa.gguf
package on macOS Ventura.Steps to Reproduce
nexaai
viapip install nexaai
.Environment Details: macOS Ventura version: [e.g., 13.0] Python version: 3.12.0 nexaaiversion: [0.0.9.0] Architecture: x86_64
OS
macOS ventura
Python Version
3.12.0
Nexa SDK Version
0.0.9.0
GPU (if using one)
No response