0hq / WebGPT

Run GPT model on the browser with WebGPU. An implementation of GPT inference in less than ~1500 lines of vanilla Javascript.
https://kmeans.org
Other
3.61k stars 206 forks source link

How slow is slow? #49

Open flatsiedatsie opened 6 months ago

flatsiedatsie commented 6 months ago

I downloaded the github repo and placed in on a localhost server.

I opened the page, and clicked on the "Load GPT2 117Mb" model.

I've been waiting for a few minutes now, with the output stuck on Loading token embeddings.... Is that normal behaviour?

Loading model from folder: gpt2
Loading params...
Warning: Buffer size calc result exceeds GPU limit, are you using this value for a tensor size? 50257 768 1 154389504
bufferSize @ model.js:510
loadParameters @ model.js:298
await in loadParameters (async)
loadModel @ model.js:276
initialize @ model.js:32
await in initialize (async)
loadModel @ gpt/:105
onclick @ gpt/:23
Params: {n_layer: 12, n_head: 12, n_embd: 768, vocab_size: 50257, n_ctx: 1024, …}
Loading token embeddings...