Closed cmp-nct closed 1 year ago
At 1000 tokens on single GPU I have these speeds now:
That is already quite respectable
At 1000 tokens on single GPU I have these speeds now:
* 40/second for 7B * 17/second for 40B At around 50 tokens: * 55/second for 7B * 24/second for 40B (4090 using 4K quantization and squeezing it into VRAM using negative reserved config)
That is already quite respectable
Around here I call it deeply impressive. :-)
Biggest changes:
Medium changes:
Small changes: