0hq / WebGPT

Run GPT model on the browser with WebGPU. An implementation of GPT inference in less than ~1500 lines of vanilla Javascript.
https://kmeans.org
Other
3.61k stars 206 forks source link

add sample script for int8-gemm #31

Closed carsonpo closed 1 year ago

carsonpo commented 1 year ago

Don't have time to add it to your systems in place, but this 3.5x the FLOPs for a very skinny matmul (cached KV inference) and should 4x decrease the model checkpoint size. Need to change it a bit more to add better absmax calculation (probably vectorwise instead of the obviously unoptimal global) but the MAE is very reasonable for the setup shown.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
web-gpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 1, 2023 5:25pm
0hq commented 1 year ago

Sweet! What's with the change to params_gpt?