Open fydeos-alex opened 7 months ago
Hello, have you built the kernal with the provided rknpu files? I came into trouble when I built the kernal for "implicit declaration of function 'vm_flags_set'" and "implicit declaration of function 'vm_flags_clear'"
I just followed the doc and ran the model of qwen, but it didn't work for phi-2 to run on the rk3588 8G npu. And you can try the newest openfyde version which has already updated the kernel.
I just followed the doc and ran the model of qwen, but it didn't work for phi-2 to run on the rk3588 8G npu. And you can try the newest openfyde version which has already updated the kernel.
thanks a lot! may you have a nice day!
Found the same problem. I was able to run Qwen 1.8B on Orange Pi 5 (RK3588S) 4GB, but can't run Phi-2.
@fydeos-alex What options did you insert to the llm.build()
function? I used the default:
ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8', target_platform='rk3588')
Your memory is too small. The phi-2 model will cost more than 4GB when it is running in the INT8 quantization. I think this may also relate to your operating system memory strategy. Hope this will help.
Found the same problem. I was able to run Qwen 1.8B on Orange Pi 5 (RK3588S) 4GB, but can't run Phi-2. @fydeos-alex What options did you insert to the
llm.build()
function? I used the default:
ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8', target_platform='rk3588')
Doesn't seem too logical for me, if Qwen 1.8B runs on my board and this (Phi-2) is barely bigger...
Besides, I was able to run a non-optimised Phi-2 on the same board through Ollama and OpenWebUI (which uses more RAM due to the UI and being a non-optimised version). Not only that, but your 8GB RAM can't also run it.
I believe this is some kind of bug. Were you able to run it in the end @fydeos-alex ?
Phi-2 didn't run successfully on my openfyde system in the end, and I am still waiting for more low-level support from RK Offical. BTW, Ollama loads Phi-2 in INT4 quantized, while rkllm only can convert to INT8, which means the rk model size is twice the Ollama. @Pelochus
Then that makes more sense. Still you should be able to run it on your 8GB model anyway.
Let's see if Rockchip releases a new RKLLM version with more LLMs support, fixed and updated Phi-2 and INT4 optimisation...
A bit off topic, but have you been able to run Llama 2 or TinyLlama?
@fydeos-alex
No, Llama 2 is too big for me, you know, I also need to run the openfyde system which already costs some of my runtime memory. I haven't tried TinyLlama yet and don't know whether the chip supports it or not. Please let me know if it works well. Qwen is enough for me now. 😇
I've been able to convert TinyLlama, however it doesn't work for me. It makes sense that Llama 2 doesn't work on 8GB though, it would be nice to see if someone with 32G or 16G can try to run it. I can't even convert it to RKLLM format due to the amount of RAM it requires to do so...
Nice try, let's wait for RockChip to continue their work.
the same problem
@noah003 how much RAM, which SBC? Orange Pi 5?
You guys might find this useful:
Edit: More useful links
@noah003 how much RAM, which SBC? Orange Pi 5?
16GB, Orange Pi 5
This error occurred when I ran the Phi-2 model on rk3588 8G npu. I ran the Qwen1.8B successfully on it, but the phi didn't work on it. I am not sure whether the error was caused by the chip memory. other info:
BTW, the model load speed was awful, what can I do to improve the experience?