Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
19.86k stars 4.12k forks source link

ARM SVE #1997

Open Artoria2e5 opened 3 years ago

Artoria2e5 commented 3 years ago

ARM SVE 和 RISC-V Vector 在精神上差不多,并且 qemu 有支持。或许可以写个简单的 layer?

baryluk commented 3 years ago

If you set a proper -march and -O3, gcc and clang should autovectorize suitable loops for SVE, and they are pretty good at it. Try: -march=armv8-a+sve -O3.

Artoria2e5 commented 3 years ago

This is the conclusion I came to too. I mean, the same can be said for a lot of the other asm code here, and most of the difference is really due to the hard-coded elempack sizes.

The main problem is really about making the elempack stuff more flexible. And for the RISC arches, plumbing through alignment.

baryluk commented 3 years ago

I think the issue is there are no real hardware to test any code you write by hand at the moment. I mean there is Fujitsu A64FX, but it is super hard to get a development system for this. Then there is an ARM simulator, but, it is slow and tedious process.