Closed taegeonum closed 1 week ago
It seems that the problem occurs when constructing QNN computing graphs. The program might crash due to lack of memory. It's had to tell what is the reason from your message provided. Could you turn on the 'DEBUG' option in cmake and give more logs and the information of your test device?
@oreomaker Thanks for your help. I'm using S24U (Mem 12GB). This is the tail log. I'm wondering why it consumes a lot of memory, even if running a small model (qwen-1.8b quantized).
Memory Usage: 2069 MB(22435) at: before graph finilize
Memory Usage: 2086 MB(22490) at: after graph finilize
input tensors num:2
output tensors num:3
qnn backend setup tensors
graph name: Prompt_Graph.42
cpu backend
model.layers.10.self_attn.qkv_split reshape:
|| Input outtensor-model.layers.10.self_attn.qkv_merge-00 shape: 1 1024 16 128 (2097152) |
|| Output outtensor-model.layers.10.self_attn.qkv_split-00 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.qkv_split-01 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.qkv_split-02 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.qkv_split-03 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.q_rope reshape:
|| Input outtensor-model.layers.10.self_attn.qkv_split-00 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.q_rope-00 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.k_rope reshape:
|| Input outtensor-model.layers.10.self_attn.qkv_split-01 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.k_rope-00 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.k_cache reshape:
|| Input outtensor-model.layers.10.self_attn.k_rope-00 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.k_cache-00 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.v_cache reshape:
|| Input outtensor-model.layers.10.self_attn.qkv_split-02 shape: 1 16 128 64 (131072) |
|| Output outtensor-model.layers.10.self_attn.v_cache-00 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.qk reshape:
|| Input outtensor-model.layers.10.self_attn.q_rope-00 shape: 1 64 16 128 (131072) |Input outtensor-model.layers.10.self_attn.k_cache-00 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.qk-00 shape: 1 64 16 64 (65536) |
model.layers.10.self_attn.softmax reshape:
|| Input outtensor-model.layers.10.self_attn.qk-00 shape: 1 64 16 64 (65536) |
|| Output outtensor-model.layers.10.self_attn.softmax-00 shape: 1 64 16 64 (65536) |
model.layers.10.self_attn.qkv reshape:
|| Input outtensor-model.layers.10.self_attn.softmax-00 shape: 1 64 16 64 (65536) |Input outtensor-model.layers.10.self_attn.v_cache-00 shape: 1 16 128 64 (131072) |
|| Output outtensor-model.layers.10.self_attn.qkv-00 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.o_proj.quantize reshape:
|| Input outtensor-model.layers.10.self_attn.qkv-00 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.o_proj.quantize-00 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.or_merge reshape:
|| Input outtensor-model.layers.10.self_attn.o_proj.quantize-00 shape: 1 64 16 128 (131072) |Input outtensor-model.layers.10.self_attn.qkv_split-03 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.or_merge-00 shape: 1 320 16 128 (655360) |
---------QNN alloc
cpu backend - reshape, setup tensors
graph name: Prompt_Graph.43
qnn backend
qnn backend cast
model.layers.10.self_attn.or_split reshape:
|| Input outtensor-model.layers.10.self_attn.or_merge-00 shape: 1 320 16 128 (655360) |
|| Output outtensor-model.layers.10.self_attn.or_split-00 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.or_split-01 shape: 1 64 16 128 (131072) |
model.layers.10.self_attn.or_split-00_view_ reshape:
|| Input outtensor-model.layers.10.self_attn.or_split-00 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.or_split-00_view_-00 shape: 1 32 2 2048 (131072) |
model.layers.10.self_attn.or_split-01_view_ reshape:
|| Input outtensor-model.layers.10.self_attn.or_split-01 shape: 1 64 16 128 (131072) |
|| Output outtensor-model.layers.10.self_attn.or_split-01_view_-00 shape: 1 64 1 2048 (131072) |
model.layers.10.self_attn.o_proj reshape:
|| Input outtensor-model.layers.10.self_attn.or_split-00_view_-00 shape: 1 32 2 2048 (131072) |
|| Output outtensor-model.layers.10.self_attn.o_proj-00 shape: 1 32 2 2048 (131072) |
model.layers.10.self_attn.o_proj.dequantize reshape:
|| Input outtensor-model.layers.10.self_attn.o_proj-00 shape: 1 32 2 2048 (131072) |
|| Output outtensor-model.layers.10.self_attn.o_proj.dequantize-00 shape: 1 32 2 2048 (131072) |
model.layers.10.self_attn.o_proj.dequantize-00_view_ reshape:
|| Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00 shape: 1 32 2 2048 (131072) |
|| Output outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) |
model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_ reshape:
|| Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) |Input outtensor-model.layers.10.self_attn.or_split-01_view_-00 shape: 1 64 1 2048 (131072) |
|| Output outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) |
model.layers.10.post_attention_layernorm reshape:
|| Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) |
|| Output outtensor-model.layers.10.post_attention_layernorm-00 shape: 1 64 1 2048 (131072) |
model.layers.10.mlp.up_proj.quantize reshape:
|| Input outtensor-model.layers.10.post_attention_layernorm-00 shape: 1 64 1 2048 (131072) |
|| Output outtensor-model.layers.10.mlp.up_proj.quantize-00 shape: 1 64 1 2048 (131072) |
model.layers.10.mlp.up_proj.quantize-00_view_ reshape:
|| Input outtensor-model.layers.10.mlp.up_proj.quantize-00 shape: 1 64 1 2048 (131072) |
|| Output outtensor-model.layers.10.mlp.up_proj.quantize-00_view_-00 shape: 1 32 2 2048 (131072) |
model.layers.10.mlp.gate_proj reshape:
|| Input outtensor-model.layers.10.mlp.up_proj.quantize-00_view_-00 shape: 1 32 2 2048 (131072) |
|| Output outtensor-model.layers.10.mlp.gate_proj-00 shape: 1 32 2 5504 (352256) |
model.layers.10.mlp.up_proj reshape:
|| Input outtensor-model.layers.10.mlp.up_proj.quantize-00_view_-00 shape: 1 32 2 2048 (131072) |
|| Output outtensor-model.layers.10.mlp.up_proj-00 shape: 1 32 2 5504 (352256) |
model.layers.10.mlp.gate_proj.dequantize reshape:
|| Input outtensor-model.layers.10.mlp.gate_proj-00 shape: 1 32 2 5504 (352256) |
|| Output outtensor-model.layers.10.mlp.gate_proj.dequantize-00 shape: 1 32 2 5504 (352256) |
model.layers.10.mlp.up_proj.dequantize reshape:
|| Input outtensor-model.layers.10.mlp.up_proj-00 shape: 1 32 2 5504 (352256) |
|| Output outtensor-model.layers.10.mlp.up_proj.dequantize-00 shape: 1 32 2 5504 (352256) |
model.layers.10.mlp.silu reshape:
|| Input outtensor-model.layers.10.mlp.gate_proj.dequantize-00 shape: 1 32 2 5504 (352256) |
|| Output outtensor-model.layers.10.mlp.silu-00 shape: 1 32 2 5504 (352256) |
model.layers.10.mlp.silu-00_mul_ reshape:
|| Input outtensor-model.layers.10.mlp.silu-00 shape: 1 32 2 5504 (352256) |Input outtensor-model.layers.10.mlp.up_proj.dequantize-00 shape: 1 32 2 5504 (352256) |
|| Output outtensor-model.layers.10.mlp.silu-00_mul_-00 shape: 1 32 2 5504 (352256) |
model.layers.10.mlp.down_proj.quantize reshape:
|| Input outtensor-model.layers.10.mlp.silu-00_mul_-00 shape: 1 32 2 5504 (352256) |
|| Output outtensor-model.layers.10.mlp.down_proj.quantize-00 shape: 1 32 2 5504 (352256) |
model.layers.10.mlp.down_proj reshape:
|| Input outtensor-model.layers.10.mlp.down_proj.quantize-00 shape: 1 32 2 5504 (352256) |
|| Output outtensor-model.layers.10.mlp.down_proj-00 shape: 1 32 2 2048 (131072) |
model.layers.10.mlp.down_proj.dequantize reshape:
|| Input outtensor-model.layers.10.mlp.down_proj-00 shape: 1 32 2 2048 (131072) |
|| Output outtensor-model.layers.10.mlp.down_proj.dequantize-00 shape: 1 32 2 2048 (131072) |
model.layers.10.mlp.down_proj.dequantize-00_view_ reshape:
|| Input outtensor-model.layers.10.mlp.down_proj.dequantize-00 shape: 1 32 2 2048 (131072) |
|| Output outtensor-model.layers.10.mlp.down_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) |
model.layers.10.mlp.down_proj.dequantize-00_view_-00_add_ reshape:
|| Input outtensor-model.layers.10.mlp.down_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) |Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) |
|| Output outtensor-model.layers.10.mlp.down_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) |
qnn backend reshape
---------QNN alloc
---------QNN alloc
model.layers.10.self_attn.or_split-00_view_ input type:16
model.layers.10.self_attn.or_split-00_view_ output type:16
model.layers.10.self_attn.or_split-00_view_is QNN INT8 op
model.layers.10.self_attn.or_split-01_view_ input type:0
model.layers.10.self_attn.or_split-01_view_ output type:0
model.layers.10.self_attn.o_proj.dequantize-00_view_ input type:0
model.layers.10.self_attn.o_proj.dequantize-00_view_ output type:0
model.layers.10.mlp.up_proj.quantize-00_view_ input type:16
model.layers.10.mlp.up_proj.quantize-00_view_ output type:16
model.layers.10.mlp.up_proj.quantize-00_view_is QNN INT8 op
model.layers.10.mlp.down_proj.dequantize-00_view_ input type:0
model.layers.10.mlp.down_proj.dequantize-00_view_ output type:0
(crash)
It is Segmentation fault too on OPPO Find X7 Ultral (Snapdrag8Gen3, DDR = 16GB) !
load time: 1474.77 ms token time: nan ms inference speed: nan tokens/s load time: 2678.93 ms token time: nan ms inference speed: nan tokens/s Segmentation fault
@oreomaker Thanks for your help. I'm using S24U (Mem 12GB). This is the tail log. I'm wondering why it consumes a lot of memory, even if running a small model (qwen-1.8b quantized).
Memory Usage: 2069 MB(22435) at: before graph finilize Memory Usage: 2086 MB(22490) at: after graph finilize input tensors num:2 output tensors num:3 qnn backend setup tensors graph name: Prompt_Graph.42 cpu backend model.layers.10.self_attn.qkv_split reshape: || Input outtensor-model.layers.10.self_attn.qkv_merge-00 shape: 1 1024 16 128 (2097152) | || Output outtensor-model.layers.10.self_attn.qkv_split-00 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.qkv_split-01 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.qkv_split-02 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.qkv_split-03 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.q_rope reshape: || Input outtensor-model.layers.10.self_attn.qkv_split-00 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.q_rope-00 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.k_rope reshape: || Input outtensor-model.layers.10.self_attn.qkv_split-01 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.k_rope-00 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.k_cache reshape: || Input outtensor-model.layers.10.self_attn.k_rope-00 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.k_cache-00 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.v_cache reshape: || Input outtensor-model.layers.10.self_attn.qkv_split-02 shape: 1 16 128 64 (131072) | || Output outtensor-model.layers.10.self_attn.v_cache-00 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.qk reshape: || Input outtensor-model.layers.10.self_attn.q_rope-00 shape: 1 64 16 128 (131072) |Input outtensor-model.layers.10.self_attn.k_cache-00 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.qk-00 shape: 1 64 16 64 (65536) | model.layers.10.self_attn.softmax reshape: || Input outtensor-model.layers.10.self_attn.qk-00 shape: 1 64 16 64 (65536) | || Output outtensor-model.layers.10.self_attn.softmax-00 shape: 1 64 16 64 (65536) | model.layers.10.self_attn.qkv reshape: || Input outtensor-model.layers.10.self_attn.softmax-00 shape: 1 64 16 64 (65536) |Input outtensor-model.layers.10.self_attn.v_cache-00 shape: 1 16 128 64 (131072) | || Output outtensor-model.layers.10.self_attn.qkv-00 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.o_proj.quantize reshape: || Input outtensor-model.layers.10.self_attn.qkv-00 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.o_proj.quantize-00 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.or_merge reshape: || Input outtensor-model.layers.10.self_attn.o_proj.quantize-00 shape: 1 64 16 128 (131072) |Input outtensor-model.layers.10.self_attn.qkv_split-03 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.or_merge-00 shape: 1 320 16 128 (655360) | ---------QNN alloc cpu backend - reshape, setup tensors graph name: Prompt_Graph.43 qnn backend qnn backend cast model.layers.10.self_attn.or_split reshape: || Input outtensor-model.layers.10.self_attn.or_merge-00 shape: 1 320 16 128 (655360) | || Output outtensor-model.layers.10.self_attn.or_split-00 shape: 1 64 16 128 (131072) |Output outtensor-model.layers.10.self_attn.or_split-01 shape: 1 64 16 128 (131072) | model.layers.10.self_attn.or_split-00_view_ reshape: || Input outtensor-model.layers.10.self_attn.or_split-00 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.or_split-00_view_-00 shape: 1 32 2 2048 (131072) | model.layers.10.self_attn.or_split-01_view_ reshape: || Input outtensor-model.layers.10.self_attn.or_split-01 shape: 1 64 16 128 (131072) | || Output outtensor-model.layers.10.self_attn.or_split-01_view_-00 shape: 1 64 1 2048 (131072) | model.layers.10.self_attn.o_proj reshape: || Input outtensor-model.layers.10.self_attn.or_split-00_view_-00 shape: 1 32 2 2048 (131072) | || Output outtensor-model.layers.10.self_attn.o_proj-00 shape: 1 32 2 2048 (131072) | model.layers.10.self_attn.o_proj.dequantize reshape: || Input outtensor-model.layers.10.self_attn.o_proj-00 shape: 1 32 2 2048 (131072) | || Output outtensor-model.layers.10.self_attn.o_proj.dequantize-00 shape: 1 32 2 2048 (131072) | model.layers.10.self_attn.o_proj.dequantize-00_view_ reshape: || Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00 shape: 1 32 2 2048 (131072) | || Output outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) | model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_ reshape: || Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) |Input outtensor-model.layers.10.self_attn.or_split-01_view_-00 shape: 1 64 1 2048 (131072) | || Output outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) | model.layers.10.post_attention_layernorm reshape: || Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) | || Output outtensor-model.layers.10.post_attention_layernorm-00 shape: 1 64 1 2048 (131072) | model.layers.10.mlp.up_proj.quantize reshape: || Input outtensor-model.layers.10.post_attention_layernorm-00 shape: 1 64 1 2048 (131072) | || Output outtensor-model.layers.10.mlp.up_proj.quantize-00 shape: 1 64 1 2048 (131072) | model.layers.10.mlp.up_proj.quantize-00_view_ reshape: || Input outtensor-model.layers.10.mlp.up_proj.quantize-00 shape: 1 64 1 2048 (131072) | || Output outtensor-model.layers.10.mlp.up_proj.quantize-00_view_-00 shape: 1 32 2 2048 (131072) | model.layers.10.mlp.gate_proj reshape: || Input outtensor-model.layers.10.mlp.up_proj.quantize-00_view_-00 shape: 1 32 2 2048 (131072) | || Output outtensor-model.layers.10.mlp.gate_proj-00 shape: 1 32 2 5504 (352256) | model.layers.10.mlp.up_proj reshape: || Input outtensor-model.layers.10.mlp.up_proj.quantize-00_view_-00 shape: 1 32 2 2048 (131072) | || Output outtensor-model.layers.10.mlp.up_proj-00 shape: 1 32 2 5504 (352256) | model.layers.10.mlp.gate_proj.dequantize reshape: || Input outtensor-model.layers.10.mlp.gate_proj-00 shape: 1 32 2 5504 (352256) | || Output outtensor-model.layers.10.mlp.gate_proj.dequantize-00 shape: 1 32 2 5504 (352256) | model.layers.10.mlp.up_proj.dequantize reshape: || Input outtensor-model.layers.10.mlp.up_proj-00 shape: 1 32 2 5504 (352256) | || Output outtensor-model.layers.10.mlp.up_proj.dequantize-00 shape: 1 32 2 5504 (352256) | model.layers.10.mlp.silu reshape: || Input outtensor-model.layers.10.mlp.gate_proj.dequantize-00 shape: 1 32 2 5504 (352256) | || Output outtensor-model.layers.10.mlp.silu-00 shape: 1 32 2 5504 (352256) | model.layers.10.mlp.silu-00_mul_ reshape: || Input outtensor-model.layers.10.mlp.silu-00 shape: 1 32 2 5504 (352256) |Input outtensor-model.layers.10.mlp.up_proj.dequantize-00 shape: 1 32 2 5504 (352256) | || Output outtensor-model.layers.10.mlp.silu-00_mul_-00 shape: 1 32 2 5504 (352256) | model.layers.10.mlp.down_proj.quantize reshape: || Input outtensor-model.layers.10.mlp.silu-00_mul_-00 shape: 1 32 2 5504 (352256) | || Output outtensor-model.layers.10.mlp.down_proj.quantize-00 shape: 1 32 2 5504 (352256) | model.layers.10.mlp.down_proj reshape: || Input outtensor-model.layers.10.mlp.down_proj.quantize-00 shape: 1 32 2 5504 (352256) | || Output outtensor-model.layers.10.mlp.down_proj-00 shape: 1 32 2 2048 (131072) | model.layers.10.mlp.down_proj.dequantize reshape: || Input outtensor-model.layers.10.mlp.down_proj-00 shape: 1 32 2 2048 (131072) | || Output outtensor-model.layers.10.mlp.down_proj.dequantize-00 shape: 1 32 2 2048 (131072) | model.layers.10.mlp.down_proj.dequantize-00_view_ reshape: || Input outtensor-model.layers.10.mlp.down_proj.dequantize-00 shape: 1 32 2 2048 (131072) | || Output outtensor-model.layers.10.mlp.down_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) | model.layers.10.mlp.down_proj.dequantize-00_view_-00_add_ reshape: || Input outtensor-model.layers.10.mlp.down_proj.dequantize-00_view_-00 shape: 1 64 1 2048 (131072) |Input outtensor-model.layers.10.self_attn.o_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) | || Output outtensor-model.layers.10.mlp.down_proj.dequantize-00_view_-00_add_-00 shape: 1 64 1 2048 (131072) | qnn backend reshape ---------QNN alloc ---------QNN alloc model.layers.10.self_attn.or_split-00_view_ input type:16 model.layers.10.self_attn.or_split-00_view_ output type:16 model.layers.10.self_attn.or_split-00_view_is QNN INT8 op model.layers.10.self_attn.or_split-01_view_ input type:0 model.layers.10.self_attn.or_split-01_view_ output type:0 model.layers.10.self_attn.o_proj.dequantize-00_view_ input type:0 model.layers.10.self_attn.o_proj.dequantize-00_view_ output type:0 model.layers.10.mlp.up_proj.quantize-00_view_ input type:16 model.layers.10.mlp.up_proj.quantize-00_view_ output type:16 model.layers.10.mlp.up_proj.quantize-00_view_is QNN INT8 op model.layers.10.mlp.down_proj.dequantize-00_view_ input type:0 model.layers.10.mlp.down_proj.dequantize-00_view_ output type:0 (crash)
Hi,
Thank you for providing the precise debug log. The issue is mostly likely caused by insufficient memory. It is recommended to use a device with 16GB or more to execute seq = 64 prefilling. The large memory footprint is not due to the mllm-NPU, but rather the Qualcomm QNN framework, which performs NPU graph finalization for optimizing performance. You can find a log and the corresponding code for QNN graph finalization. You may try reducing the sequence to 32 to save on memory usage. I will try to reproduce the bug using a 12GB smartphone.
Besides, when the crash occurred, were there any other QNN logs, such as:
[ ERROR ] <E> Failed to map weights buffer to device!
If such logs are present, it would more clearly indicate a memory insufficiency issue.
On the other hand, if such logs are not available, but the device experiences a black screen, no responses from the adb shell terminal (frozen), and then rebooting the device, it likely also indicates a memory insufficiency issue.
Additionally, does this memory bug only occur when -c = 0
? If that's the case, I might need to further investigate the single chunk prefilling. In my experience, seq = 64 typically requires around 10GB of memory. Thus, a total of 12GB memory is not enough, due to the OS memory usage.
It is Segmentation fault too on OPPO Find X7 Ultral (Snapdrag8Gen3, DDR = 16GB) !
load time: 1474.77 ms token time: nan ms inference speed: nan tokens/s load time: 2678.93 ms token time: nan ms inference speed: nan tokens/s Segmentation fault
Hi,
Thank you for your bug report. Could you please provide us a detailed log by doing this.
Could you turn on the 'DEBUG' option in cmake and give more logs and the information of your test device?
I think the bug for you is not during the building graph stage, but during the prefilling and decoding stage, which is not similar as @taegeonum does, since your execution has printed the inference finishing log.
If more detailed logs are available, they would greatly assist us in identifying and resolving the issue. Thank you very much for your willingness to help.
How about this issue?:
@liang1232018 Thanks for your support and explanation.
When s=64
, This error happens regardless of c
value, and it crashes silently without any exception, and the following situation happens as you mentioned:
On the other hand, if such logs are not available, but the device experiences a black screen, no responses from the adb shell terminal (frozen), and then rebooting the device, it likely also indicates a memory insufficiency issue.
When s=32
, the exception is different. Ins=32
and c=1
, the following exception occurs. It happens in the prefill phase while executing npuExe.run
:
[Q] <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Give me a short introduction to large language model.<|im_end|>
<|im_start|>assistant
[A] 5311.7ms [ ERROR ] Number of input elements 65536 does not match number of output elements 0.
5311.7ms [ ERROR ] Op specific validation failed.
0.0ms [ ERROR ] <E> validateNativeOps master op validator model.layers.0.self_attn.ires_split-00_view_:qti.aisw:Reshape failed 3110
0.0ms [ ERROR ] <E> QnnBackend_validateOpConfig failed 3110
0.0ms [ ERROR ] <E> Failed to validate op model.layers.0.self_attn.ires_split-00_view_ with error 0xc26
[ ERROR ] QnnModel::addNode() validating node model.layers.0.self_attn.ires_split-00_view_ failed.
[ ERROR ] qnnModels_[qnnModelIndex_].addNode( QNN_OPCONFIG_VERSION_1, name.c_str(), packageName.c_str(), nodeType.c_str(), paramsPtr, params.size(), inputTensorNames, inputTensorNames.size(), outputTensors.data(), outputTensors.size() ) expected MODEL_NO_ERROR, got MODEL_GRAPH_ERROR
0.0ms [ ERROR ] <E> Cannot destroy HexNN graph as PrepreLib is not loaded
0.0ms [ ERROR ] <E> Failed to destroy hexNNGraphHandle 0xaea28280
0.0ms [ ERROR ] <E> Failed to clean up hexNNGraph in htpGraph 0x1 with error 1000
0.0ms [WARNING] <W> Final cleanup: failed to clean up backend.
0.0ms [WARNING] <W> sg_stubPtr is not null, skip loadRemoteSymbols
In s=32
and c=0
, it crashes silently like s=64
setting.
Select Snapdragon SoCs support multiple cDSP process domains (PDs). Each process domain supports a virtual address space of 3.75 GBs. qwen-1.5-1.8b-chat-int8.mllm model size 3.4GBs,There's not much left I guess may be associated with this
It is Segmentation fault too on OPPO Find X7 Ultral (Snapdrag8Gen3, DDR = 16GB) ! load time: 1474.77 ms token time: nan ms inference speed: nan tokens/s load time: 2678.93 ms token time: nan ms inference speed: nan tokens/s Segmentation fault
Hi,
Thank you for your bug report. Could you please provide us a detailed log by doing this.
Could you turn on the 'DEBUG' option in cmake and give more logs and the information of your test device?
I think the bug for you is not during the building graph stage, but during the prefilling and decoding stage, which is not similar as @taegeonum does, since your execution has printed the inference finishing log.
If more detailed logs are available, they would greatly assist us in identifying and resolving the issue. Thank you very much for your willingness to help.
OK, I Created new one at https://github.com/UbiquitousLearning/mllm/issues/117, detail log also show.
Hello, I've execute
main_qwen_npu
folloing the guideline. In fact, there were minor bugs so I've manually fixed them. (e.g., missingadb push ../vocab/qwen_merges.txt ...
).When I ran
main_qwen_npu
, Android crahsed and forecly rebooted. I've logged where it is crashed using the following code in QNNExecutor.cpp:When I set
-c 0
, Android crashes while casting Prompt_Graph.43 and performingqnn_graph->setUpTensors(name)
:May I ask what the root cause of this problem is?