AnswerDotAI / cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
https://www.answer.ai/posts/2024-08-01-cold-compress.html
BSD 3-Clause "New" or "Revised" License
85 stars 8 forks source link

Profile Llama3 Attention Heads #4

Closed griff4692 closed 3 months ago

griff4692 commented 5 months ago

See this section of the writeup.

This issue involves implementing the paper below into gpt-fast

and doing a sanity check to see if the profiled attention heads are accurate for llama-3