OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.32k stars 1.49k forks source link

[RFC] integrate ruapu for runtime cpu isa extension detection #4573

Open nihui opened 6 months ago

nihui commented 6 months ago

Hello

openblas uses operating system-related methods (parsing /proc/cpuinfo) and architecture-related methods (x86 cpuid) to obtain the isa extension information of the cpu at runtime and dynamically select the optimized code path.

In the neural network acceleration library ncnn ( https://github.com/Tencent/ncnn ), related strategies are also used, but these alone may not be enough to be compatible with more systems and architectures.

Therefore, I recommend integrating ruapu ( https://github.com/nihui/ruapu ) into openblas. Ruapu is a single C header implementation. It uses capture sigill to obtain CPU isa extension support. This is compatible with many operating systems such as linux, windows, macos, and can detect more directly and accurately. Sometimes /proc/cpuinfo or x86 cpuid may lie to us ;)

Comments are welcome, if ruapu is suitable for the project, or if you have any other suggestions

brada4 commented 6 months ago

Just that it cannot tell apart haswell from zen

martin-frbg commented 6 months ago

thanks, interesting project for sure. (though we tend to use cpuinfo&similar only for direct identification of cpu model - I'm not sure if instruction trapping offers an advantage over querying cpu capability registers for instruction set extensions?)

nihui commented 6 months ago

https://github.com/nihui/ruapu?tab=readme-ov-file#features

ruapu is not intended to replace cpuinfo or the register method of obtaining information, but is a complementary detection method. The main purpose is to be used when conventional methods such as cpuinfo cannot be implemented, such as on the windows arm platform, such as detecting risc-v vendor extension, in a unified way

Ruapu currently cannot obtain relevant CPU core architectures, such as skylake zen3 cortex-a75. I plan to complete the cpu isa extension first, and then add other information as needed.

brada4 commented 6 months ago

You always need CPUID bits. https://en.wikipedia.org/wiki/FMA_instruction_set#CPUs_with_FMA4

martin-frbg commented 6 months ago

I must admit I am not aware of the situation around Windows on Arm - currently waiting for a CI solution to become available for that platform. But from what I've seen it would probably be sufficient for OpenBLAS to support a generic ARMV8 target, and possibly detect SVE availability (later). Finding out RISC-V extensions, in particular the presence (and version) of vector support, would indeed be a valuable feature where there appears to be only sketchy support depending on device and Linux kernel version