0hq / WebGPT

Run GPT model on the browser with WebGPU. An implementation of GPT inference in less than ~1500 lines of vanilla Javascript.
https://kmeans.org
Other
3.61k stars 206 forks source link

Int8 Weights + FP32 Activations GEMM #32

Closed carsonpo closed 1 year ago

carsonpo commented 1 year ago

Don't have time to add it to your systems in place, but this 3.5x the FLOPs for a very skinny matmul (cached KV inference) and should 4x decrease the model checkpoint size. Need to change it a bit more to add better absmax calculation (probably vectorwise instead of the obviously unoptimal global) but the MAE is very reasonable for the setup shown.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
web-gpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 1, 2023 6:20pm
0hq commented 1 year ago

Awesome!