Can we use W8A8B8O8Linear in LLaMA model?

AniZpZ / AutoSmoothQuant

An easy-to-use package for implementing SmoothQuant for LLMs

MIT License

82 stars 7 forks source link

Open peilin-chen opened 4 weeks ago

peilin-chen commented 4 weeks ago

Hello, thanks for your great project! I have one question. Is it possible to use W8A8B8O8Linear for qkv instead of W8A8BFP32OFP32Linear? Thanks!