IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Apache License 2.0
575 stars 45 forks source link

[SYCL]Add Marlin Kernel for SYCL runtime #33

Open abhilash1910 opened 1 month ago

abhilash1910 commented 1 month ago

@Godofnothing Thanks for creating this repository and supporting faster gemms. I am currently working on AutoGPTQ extension for SYCL runtime (https://github.com/AutoGPTQ/AutoGPTQ/pull/638) . Since the build of the asm instructions for Marlin are from here, I propose to have an analogous SYCL counterpart in this repository. I believe this addition would help us (Intel and SYCL in general) to actively benchmark against ptx ISA and check for performance gaps . This would also open avenues on non Intel hardware to use the SYCL runtime. [Creating a draft PR now]
Also tagging @fxmarty (autoGPTQ) for info. Thanks