Open pengbohua opened 11 months ago
Hello,
Any chance to have PPC64LE support for this library?
Thanks you
I ran into the same issue. It seems that there are some redundant includes in ops.cu. I was able to build after commenting out these lines (and after setting NO_WARN_X86_INTRINSICS):
diff --git a/csrc/ops.cu b/csrc/ops.cu
index 9776121..1137380 100644
--- a/csrc/ops.cu
+++ b/csrc/ops.cu
@@ -7,12 +7,12 @@
#include <kernels.cuh>
#include <cub/device/device_scan.cuh>
#include <limits>
-#include <BinSearch.h>
+// #include <BinSearch.h>
#include <cassert>
-#include <common.h>
+// #include <common.h>
-using namespace BinSearch;
+// using namespace BinSearch;
using std::cout;
using std::endl;
While it builds, I'm still testing if the built binary works as intended.
Not 100% sure, but I guess we have a similar issue to what is describe here https://github.com/dealii/dealii/issues/7328#issuecomment-429466699 and the reason here: https://github.com/dealii/dealii/issues/7328#issuecomment-444507314
Using -DNO_WARN_X86_INTRINSICS
should translate Intel intrinsic code to VSX code (which it does), but throwing a lot of identifier "__builtin_*" is undefined
errors:
[...]
/opt/at15.0/lib/gcc/powerpc64le-linux-gnu/11.4.1/include/mmintrin.h(178): error: too many initializer values
{ __m1, __m2 };
^
/opt/at15.0/lib/gcc/powerpc64le-linux-gnu/11.4.1/include/mmintrin.h(182): error: identifier "__builtin_vec_packs" is undefined
__vresult = __builtin_vec_packs (__vm1, __vm1);
^
/opt/at15.0/lib/gcc/powerpc64le-linux-gnu/11.4.1/include/mmintrin.h(183): error: expression must have pointer-to-object type but it has type "int"
return (__m64) ((__attribute__((altivec(vector__))) long long) __vresult)[0];
^
/opt/at15.0/lib/gcc/powerpc64le-linux-gnu/11.4.1/include/mmintrin.h(203): error: too many initializer values
{ __m1, __m2 };
^
/opt/at15.0/lib/gcc/powerpc64le-linux-gnu/11.4.1/include/mmintrin.h(207): error: identifier "__builtin_vec_packs" is undefined
__vresult = __builtin_vec_packs (__vm1, __vm1);
^
[...]
If I understand this correct nvcc does not know about alitvec and should therefore not see any SIMD code.
I'm wondering more about the fact that include/SIMD.h
and include/Algo-Direct2.h
are using x86 intrinsics, but there is no scalar code for this as a fallback solution for unsupported platforms such as ppc64le or aarch64
Does it make sense to create a PR using scalar code or am I missing something?
I would also appreciate a solution in this regard.
Hi, Tim. Thank you for bring such a cool quantization tool to the community. However, I met a issue when trying to compile bnb from source code on PPC64LE CPUs of my school.
The errors regarding x86_intrinsics which can be disabled via the flag (-DNO_WARN_X86_INTRINSICS) used, are there for a reason unfortunately, as the intrinsics which result in errors later on are not available in the GCC/G++ build on ppc64le CPUs.
The warning refers to a comment in the header file, which reads as follows:
The bitsandbytes code, inside include/SIMD.h and include/Algo-Direct2.h use these intrinsics, which are not portable across CPU architectures. These are included by BinSearch.h, which is included by common.h, cpu_ops.cpp and ops.cu. without any CPU architecture guarding. The PPC64LE version(s) of GCC etc do attempt to automatically convert these intrinsics to the PPC64LE equivalents when DNO_WARN_X86_INTRINSICS is set, but unfortunately this does not succeed for this code, and it will only work on x86_64 CPUs.
For this to build on PPC64LE, this will require changes to the source code, to either replace these intrinsics with portable versions which are PPC64LE compatible, or guard them out with macros so they can be disabled, and a PPC64LE compatible alternative put in place.