whl compile error：suitable constructor exists to convert from "int" to "__half"

bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

Apache License 2.0

226 stars 25 forks source link

whl compile error：suitable constructor exists to convert from "int" to "__half" #6

Closed CalebDu closed 2 weeks ago

CalebDu commented 1 month ago

build wheel by fellowing steps

cd algorithm
python setup.py build

but i got above errors with cuda 12.1 version and conda python env abq-llm .

Can repo provide a docker file or image to avoid compile problem？

lswzjuer commented 1 month ago

Hello, the code transformation of connecting ABQKernel to Torch and installing it using setup.py has not been completed yet, so the failure of setup.py is as expected. If you want to test the kernel performance, please refer to the REAME Kernel benchmark and E2E benchmark sections to compile the source code.

CalebDu commented 1 month ago

Hello, the code transformation of connecting ABQKernel to Torch and installing it using setup.py has not been completed yet, so the failure of setup.py is as expected. If you want to test the kernel performance, please refer to the REAME Kernel benchmark and E2E benchmark sections to compile the source code.

i get it, i have another question, what's the main difference between bmma and wmma version of abq gemm kernel? wmma version use nvidia offical wmma wrapper in mma.h, and bmma version use customed wrapper for ptx asm. Is there any other difference?

lswzjuer commented 1 month ago

The wmma instruction abstraction level is higher, which reduces the difficulty of calling TensorCore, but has the following problems: 1: Not all binary mma instructions are encapsulated into wmma. For example, bmma supports three shapes, but binary WMMA only supports <8,8,128> configuration; 2: wmma encapsulates the load_matrix and store_matrix operations, which prevents us from designing thread-level memory access in a refined manner (such as applying swizing strategies to avoid bankconflicts). This is the direct reason why we apply MMA optimization; 3: We want to use this open source project to show the open source community how core operators such as GEMM can perform WMMA and MMA level optimization. The two can be fully compared.