PygmalionAI / aphrodite-engine

PygmalionAI's large-scale inference engine
https://pygmalion.chat
GNU Affero General Public License v3.0
660 stars 80 forks source link

Proof of Concept: Recovering the missing KV block space #262

Closed 50h100a closed 3 months ago

50h100a commented 3 months ago

Only on Mistral, and I haven't comprehensively tested it anywhere except a few replies on one-GPU arrangements.

Loading a Mistral 7b: On an A6000, this takes us from 10257 to 12257 blocks. On a 4090, we go from "cannot run" (1227 blocks) to "can run" (2227 blocks)

I don't suggest this as a solution, but it does indicate a very specific source for the problem.

AlpinDale commented 3 months ago

Will continue in #263, closing this for now.