Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
Going to merge @haileyschoelkopf since this version makes some API changes that need to be reflected in evals which we are starting today -- can always modify / revert.
Going to merge @haileyschoelkopf since this version makes some API changes that need to be reflected in evals which we are starting today -- can always modify / revert.