AniZpZ / AutoSmoothQuant

An easy-to-use package for implementing SmoothQuant for LLMs
MIT License
82 stars 7 forks source link

Can we use W8A8B8O8Linear in LLaMA model? #26

Open peilin-chen opened 4 weeks ago

peilin-chen commented 4 weeks ago

Hello, thanks for your great project! I have one question. Is it possible to use W8A8B8O8Linear for qkv instead of W8A8BFP32OFP32Linear? Thanks!