Open NicoNico6 opened 1 year ago
Thanks.
What do you mean by MMA?
I might add support for more bit widths if there's demand for it. AFAIK 4-bits is "optimal", which is why I've focused there with the work thus far.
Hi,
The MMA means the Matrix Multiplication API in tensorcore library.
Since I am working on the Binary Neural Network, I am wondering if it is possible to write a 1-bit implementation of LLM acceleration using the Triton library.
Thanks a lot for your answer and help!
Hi, really good work, and appreciate it a lot.
I am curious whether Triton can support 1-bit acceleration for MMA. Also the further application to 1-bit GPTQ?