Open zhtroy opened 7 months ago
what's the rknnrt version?
I cloned from rknn-toolkit2 and compiled the example from rknn-toolkit2/rknpu2/examples/rknn_matmul_api_demo So, I think maybe the rknnrt version is 1.6, accouding to the documentation
I think that function is simply broken/not working as advertised. Judging from the message Not support core mask: 3
I tried to run the demo in NPU core0 and core1 on RK3588
AFAIK. RKNN 1.6.0 does not support multi-core co-working matrix multiplication, but the runtime will automatically distribute matrix multiplications (and other model inferences) onto idle NPU cores. You can run 3 instances of matrix multiplications on 3 threads and they will be distributed correctly.
So , no parallel matrix multiplication, A*B=C, I'll break the A matrix into 3 smaller matrix A1 A2 A3 (row-wise) and merge the results C1 C2 C3
@zhtroy Yes. And I've benchmarked the performance given different matrix dimensions. You want to break the matrix across the columns to achieve better performance.
See my post about speed under different shapes and discussions in llama.cpp
Also it's messy since there's only 3 NPU cores. You either can only break it down to 2 pieces, or you use all 3 NPUs + some CPU to split matrices into 4 pieces. Pick your poison.
I'm running rknn_matmul_api_demo, I tried to run the demo in NPU core0 and core1 on RK3588, but failed
modified as below
result