Closed CalebDu closed 2 weeks ago
Hello, the code transformation of connecting ABQKernel to Torch and installing it using setup.py has not been completed yet, so the failure of setup.py is as expected. If you want to test the kernel performance, please refer to the REAME Kernel benchmark and E2E benchmark sections to compile the source code.
Hello, the code transformation of connecting ABQKernel to Torch and installing it using setup.py has not been completed yet, so the failure of setup.py is as expected. If you want to test the kernel performance, please refer to the REAME Kernel benchmark and E2E benchmark sections to compile the source code.
i get it, i have another question, what's the main difference between bmma and wmma version of abq gemm kernel? wmma version use nvidia offical wmma wrapper in mma.h, and bmma version use customed wrapper for ptx asm. Is there any other difference?
The wmma instruction abstraction level is higher, which reduces the difficulty of calling TensorCore, but has the following problems: 1: Not all binary mma instructions are encapsulated into wmma. For example, bmma supports three shapes, but binary WMMA only supports <8,8,128> configuration; 2: wmma encapsulates the load_matrix and store_matrix operations, which prevents us from designing thread-level memory access in a refined manner (such as applying swizing strategies to avoid bankconflicts). This is the direct reason why we apply MMA optimization; 3: We want to use this open source project to show the open source community how core operators such as GEMM can perform WMMA and MMA level optimization. The two can be fully compared.
build wheel by fellowing steps
but i got above errors with cuda 12.1 version and conda python env abq-llm .
Can repo provide a docker file or image to avoid compile problem?