Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
Here's some initial eval code for you guys to take a look and give feedback. I'll push some additional commit(s) for supporting Dolomites, MuSiQue, and QMSum.
Here's some initial eval code for you guys to take a look and give feedback. I'll push some additional commit(s) for supporting Dolomites, MuSiQue, and QMSum.