AnswerDotAI / cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
https://www.answer.ai/posts/2024-08-01-cold-compress.html
BSD 3-Clause "New" or "Revised" License
85 stars 8 forks source link

Implements window KV-Cache Compression Strategy #9

Closed griff4692 closed 4 months ago

griff4692 commented 5 months ago
griff4692 commented 5 months ago

@griff4692 left some comments but haven't finished going through everything just yet!

Will also follow up with you to ensure I'm on the same page for certain design decisions

Thanks! No rush - I'll update as you make suggestions

griff4692 commented 4 months ago

Squashed everything into a single commit to make it easier to follow

griff4692 commented 4 months ago

@haileyschoelkopf - removed mutable python function args and made a few other minor edits. going to merge now and rebase my heavy hitters code onto the new main.