Added support for Qwen2 models.

AnswerDotAI / cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

https://www.answer.ai/posts/2024-08-01-cold-compress.html

BSD 3-Clause "New" or "Revised" License

85 stars 8 forks source link

Added support for Qwen2 models. #13

Closed fladhak closed 4 months ago

griff4692 commented 4 months ago

LGTM as long as Llama-2 still works and uses Llama-2 chat template (which I think it does from reading the code!)

Do you mind calling ruff format *.py and committing the formatted code?

fladhak commented 4 months ago

Yes, Llama-2 and Llama-3 work as before with the correct chat templates. I reformatted with ruff and merged via command line, so I'll close the PR. Relevant commit id: 6892ea11884a74a63c6462e5099344d41b9f58bd