issues
search
AnswerDotAI
/
cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
https://www.answer.ai/posts/2024-08-01-cold-compress.html
BSD 3-Clause "New" or "Revised" License
85
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Question on Performance Comparison using Different Cache Bit Precision
#46
soumendukrg
opened
1 week ago
0
Installation Issue
#45
soumendukrg
opened
1 week ago
0
How to get attention scores
#44
wiluen
opened
1 month ago
1
Question of evaluation
#43
freeSoul-SNU
opened
1 month ago
0
SnapKV
#42
SimJeg
opened
2 months ago
2
torch dependency results in error
#41
maxjeblick
opened
2 months ago
1
Does the repo support quantization methods? Does the repo support kv merge methods?
#40
foreverpiano
closed
2 months ago
5
It seems the compression doesn't work and compression ratio is always =0
#39
foreverpiano
closed
2 months ago
5
Quantized cache
#38
fladhak
closed
2 months ago
0
Small README changes
#37
haileyschoelkopf
closed
2 months ago
0
Implement ThinK
#36
griff4692
opened
2 months ago
0
Implement PyramidInfer
#35
griff4692
opened
2 months ago
0
Added support for Llama-3.1 and rope scaling.
#34
fladhak
closed
3 months ago
0
Re-writes code to make torch.compilable.
#33
griff4692
closed
3 months ago
0
Adding code to parallelize evals over multiple GPUs
#32
fladhak
closed
3 months ago
0
Added option to compress cache after prefill, before decoding first token.
#31
fladhak
closed
3 months ago
0
Implement InfLLM
#30
griff4692
opened
3 months ago
0
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
#29
griff4692
opened
3 months ago
0
Adding four tasks from RULER
#28
rbiswasfc
closed
4 months ago
1
Filter dataset to remove examples with prompts larger than max length supported by the model.
#27
fladhak
closed
4 months ago
0
Adding scrolls/quality benchmark
#26
rbiswasfc
closed
4 months ago
0
Implements FastGen in a naive way with mostly sparse masks.
#25
griff4692
closed
4 months ago
1
Added code for selecting answer based on logits for MCQ, along with code for TruthfulQA.
#24
fladhak
closed
4 months ago
2
add gist model generation utils to library
#23
uSaiPrashanth
opened
4 months ago
1
Add L2 Norm Cache and refactor prompt compression to its own file.
#22
griff4692
closed
4 months ago
0
Merge branch main into gist-tokens
#21
uSaiPrashanth
closed
4 months ago
0
Merge latest version of main branch into gist tokens
#20
uSaiPrashanth
closed
4 months ago
0
Added support for MuSiQue dataset, along with a small bug fix for generate
#19
fladhak
closed
4 months ago
0
Added Dolomites and QMSum datasets.
#18
fladhak
closed
4 months ago
1
Update Tasks to allow for train, val, test splits to be used.
#17
griff4692
closed
4 months ago
1
Adding code for evaluation, along with Squality and refernce-based metrics.
#16
fladhak
closed
4 months ago
1
Merge branch main into gist-tokens
#15
uSaiPrashanth
closed
4 months ago
0
Fix various bugs
#14
VikParuchuri
closed
4 months ago
0
Added support for Qwen2 models.
#13
fladhak
closed
4 months ago
2
Naive implementation of sparse Top-K attention approximation from FlexGen.
#12
griff4692
closed
3 months ago
0
Implement Scissorhands KV-cache compression & SnapKV prompt compression
#11
griff4692
closed
4 months ago
0
Implement Scissorhands paper as KVCacheScissorHands.
#10
griff4692
closed
4 months ago
0
Implements window KV-Cache Compression Strategy
#9
griff4692
closed
4 months ago
3
Implement Evaluations (Decide on datasets and benchmark initial methods)
#8
griff4692
closed
3 months ago
0
Record memory consumption
#7
griff4692
closed
4 months ago
0
Record Model Speed in evals
#6
griff4692
closed
4 months ago
0
Experiment with Fixed Global Tokens
#5
griff4692
closed
4 months ago
1
Profile Llama3 Attention Heads
#4
griff4692
closed
3 months ago
0
Compute Heavy Hitters KV-Cache Eviction Policy
#3
griff4692
closed
3 months ago
0
LLama3 GIST
#2
griff4692
closed
3 months ago
0
Benchmark prompt summarizers with frontier LLMs (GPT-4o / Opus)
#1
griff4692
closed
4 months ago
1