@Godofnothing Thanks for creating this repository and supporting faster gemms.
I am currently working on AutoGPTQ extension for SYCL runtime (https://github.com/AutoGPTQ/AutoGPTQ/pull/638) . Since the build of the asm instructions for Marlin are from here, I propose to have an analogous SYCL counterpart in this repository.
I believe this addition would help us (Intel and SYCL in general) to actively benchmark against ptx ISA and check for performance gaps . This would also open avenues on non Intel hardware to use the SYCL runtime. [Creating a draft PR now]
Also tagging @fxmarty (autoGPTQ) for info. Thanks
@Godofnothing Thanks for creating this repository and supporting faster gemms. I am currently working on AutoGPTQ extension for SYCL runtime (https://github.com/AutoGPTQ/AutoGPTQ/pull/638) . Since the build of the asm instructions for Marlin are from here, I propose to have an analogous SYCL counterpart in this repository. I believe this addition would help us (Intel and SYCL in general) to actively benchmark against ptx ISA and check for performance gaps . This would also open avenues on non Intel hardware to use the SYCL runtime. [Creating a draft PR now]
Also tagging @fxmarty (autoGPTQ) for info. Thanks