RWKV / rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
MIT License
1.42k stars 98 forks source link

(Ubuntu x86_64) Segmentation Fault Running Q4_1_O Model #32

Closed cryscan closed 1 year ago

cryscan commented 1 year ago

System: Ubuntu 20.04.6 LTS GCC: 9.4.0 CPU: Intel(R) Xeon(R) Platinum 8358P

Issue:

$ python rwkv/chat_with_bot.py /path/to/models/Raven-14B-v9-Q4.bin 
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
Loading RWKV model
Processing 92 prompt tokens, may take a while
Segmentation fault (core dumped)
saharNooby commented 1 year ago

Can it be reproduced with 169M model?

BuilderGuy1 commented 1 year ago

I'm having the same issue with Apple Silicon. I tested latest 14B & 7B, each with all 3 quantization options. I also tried with 3B v10 Q4_1_0.

dmahurin commented 1 year ago

Same issue on Apple M2, with all revisions with the Q4_1_0. Tried 3B and 169M.

poisson-fish commented 1 year ago

confirm this issue under Arch WSL with RWKV-4-Raven-14B-v9-Eng99%-Other1%-20230412-ctx8192_q4_1_0.bin

L-M-Sherlock commented 1 year ago

I also encounter this problem in my mac m1. Tried 7B and 3B. It occurred before Processing 92 prompt tokens.

saharNooby commented 1 year ago

Probably address misalignment. I'm working on it in https://github.com/saharNooby/rwkv.cpp/pull/33

saharNooby commented 1 year ago

Alignment fix merged. Please clone the repo from scratch and try again:

git clone --recursive https://github.com/saharNooby/rwkv.cpp.git

L-M-Sherlock commented 1 year ago

Thanks! #33 solved my problem. But another bug appeared. The bot only repeats,> Bob: Hello, Bob..

image

python rwkv/convert_pytorch_to_ggml.py ./RWKV-4-Raven-3B-v9x-Eng49%-Chn50%-Other1%-20230417-ctx4096.pth ./rwkv.cpp-3B.bin float16
python rwkv/quantize.py ./rwkv.cpp-3B.bin ./rwkv.cpp-3B-Q4_1_0.bin 4
python rwkv/chat_with_bot.py ./rwkv.cpp-3B-Q4_1_0.bin
saharNooby commented 1 year ago

@BuilderGuy1 @poisson-fish @dmahurin If possible, can you also confirm that segfault is fixed? (please clone from scratch or don't forget to update git submodules)

@L-M-Sherlock Thanks for the input. Looks like default prompt is not good for Raven, related issue is https://github.com/saharNooby/rwkv.cpp/issues/22

dmahurin commented 1 year ago

Thanks @saharNooby. It works now on Apple M2 with 3B and 169M.

It also works with rwkv-4_raven-7b-v9 and rwkv-4_raven-14b-v9, though 14b is slow on M2.

saharNooby commented 1 year ago

Thanks for testing it!

poisson-fish commented 1 year ago

sorry for late reply @saharNooby, the previous crash is fixed however now I get SIGSEGV:

python rwkv/chat_with_bot.py ./build/models/RWKV/RWKV-4-Raven-14B-v9-Eng99\%-Other1\%-20230412-ctx8192_q4_1
_0.bin
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
Processing 92 prompt tokens, may take a while
~/Documents/Projects/cpp/llamapi/rwkv.cpp/rwkv/rwkv_cpp_model.py:100: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  state_out.storage().data_ptr(),
~/Documents/Projects/cpp/llamapi/rwkv.cpp/rwkv/rwkv_cpp_model.py:101: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  logits_out.storage().data_ptr()
fish: Job 1, 'python rwkv/chat_with_bot.py ./…' terminated by signal SIGSEGV (Address boundary error)

can open new issue if necessary.

edit: disregard, required a submodule update and works now