casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.68k stars 202 forks source link

Support State Space Models #262

Open abhinavkulkarni opened 9 months ago

abhinavkulkarni commented 9 months ago

Hi @casper-hansen,

Please add support for state space models such as the recently released StripedHyena released by TogetherAI (and authors of Flash Attention 2). These models supposedly do well in really long contexts and are much easier to train and infer compared to transformers-only models such as Llama 2. Please note, that the StripedHyena model has some layers made up of SSM blocks whereas others utilize usual transformer blocks.

More details here: https://www.together.ai/blog/stripedhyena-7b

Thanks!

casper-hansen commented 9 months ago

Hi @abhinavkulkarni, thanks for posting this. I talked with the Striped Hyena team and I am looking to implement it. I have already started on a branch below, but needs more testing.

https://github.com/casper-hansen/AutoAWQ/tree/striped_hyena