Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
Yes, Llama-2 and Llama-3 work as before with the correct chat templates. I reformatted with ruff and merged via command line, so I'll close the PR. Relevant commit id: 6892ea11884a74a63c6462e5099344d41b9f58bd
LGTM as long as Llama-2 still works and uses Llama-2 chat template (which I think it does from reading the code!)
Do you mind calling
ruff format *.py
and committing the formatted code?