fpgaminer / GPTQ-triton

GPTQ inference Triton kernel
Apache License 2.0
284 stars 23 forks source link

1-bit acceleration support #7

Open NicoNico6 opened 1 year ago

NicoNico6 commented 1 year ago

Hi, really good work, and appreciate it a lot.

I am curious whether Triton can support 1-bit acceleration for MMA. Also the further application to 1-bit GPTQ?

fpgaminer commented 1 year ago

Thanks.

What do you mean by MMA?

I might add support for more bit widths if there's demand for it. AFAIK 4-bits is "optimal", which is why I've focused there with the work thus far.

NicoNico6 commented 1 year ago

Hi,

The MMA means the Matrix Multiplication API in tensorcore library.

Since I am working on the Binary Neural Network, I am wondering if it is possible to write a 1-bit implementation of LLM acceleration using the Triton library.

Thanks a lot for your answer and help!