cmp-nct / ggllm.cpp

Falcon LLM ggml framework with CPU and GPU support
Other
244 stars 21 forks source link

Cuda performance broadcast #32

Closed cmp-nct closed 1 year ago

cmp-nct commented 1 year ago

Biggest changes:

Medium changes:

Small changes:

cmp-nct commented 1 year ago

At 1000 tokens on single GPU I have these speeds now:

That is already quite respectable

jploski commented 1 year ago

At 1000 tokens on single GPU I have these speeds now:

* 40/second for 7B

* 17/second for 40B
  At around 50 tokens:

* 55/second for 7B

* 24/second for 40B (4090 using 4K quantization and squeezing it into VRAM using negative reserved config)

That is already quite respectable

Around here I call it deeply impressive. :-)