AnswerDotAI cold-compress issues

AnswerDotAI / cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

https://www.answer.ai/posts/2024-08-01-cold-compress.html

BSD 3-Clause "New" or "Revised" License

85 stars 8 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Question on Performance Comparison using Different Cache Bit Precision

#46 soumendukrg opened 1 week ago
0
Installation Issue

#45 soumendukrg opened 1 week ago
0
How to get attention scores

#44 wiluen opened 1 month ago
1
Question of evaluation

#43 freeSoul-SNU opened 1 month ago
0
SnapKV

#42 SimJeg opened 2 months ago
2
torch dependency results in error

#41 maxjeblick opened 2 months ago
1
Does the repo support quantization methods? Does the repo support kv merge methods?

#40 foreverpiano closed 2 months ago
5
It seems the compression doesn't work and compression ratio is always =0

#39 foreverpiano closed 2 months ago
5
Quantized cache

#38 fladhak closed 2 months ago
0
Small README changes

#37 haileyschoelkopf closed 2 months ago
0
Implement ThinK

#36 griff4692 opened 2 months ago
0
Implement PyramidInfer

#35 griff4692 opened 2 months ago
0
Added support for Llama-3.1 and rope scaling.

#34 fladhak closed 3 months ago
0
Re-writes code to make torch.compilable.

#33 griff4692 closed 3 months ago
0
Adding code to parallelize evals over multiple GPUs

#32 fladhak closed 3 months ago
0
Added option to compress cache after prefill, before decoding first token.

#31 fladhak closed 3 months ago
0
Implement InfLLM

#30 griff4692 opened 3 months ago
0
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

#29 griff4692 opened 3 months ago
0
Adding four tasks from RULER

#28 rbiswasfc closed 4 months ago
1
Filter dataset to remove examples with prompts larger than max length supported by the model.

#27 fladhak closed 4 months ago
0
Adding scrolls/quality benchmark

#26 rbiswasfc closed 4 months ago
0
Implements FastGen in a naive way with mostly sparse masks.

#25 griff4692 closed 4 months ago
1
Added code for selecting answer based on logits for MCQ, along with code for TruthfulQA.

#24 fladhak closed 4 months ago
2
add gist model generation utils to library

#23 uSaiPrashanth opened 4 months ago
1
Add L2 Norm Cache and refactor prompt compression to its own file.

#22 griff4692 closed 4 months ago
0
Merge branch main into gist-tokens

#21 uSaiPrashanth closed 4 months ago
0
Merge latest version of main branch into gist tokens

#20 uSaiPrashanth closed 4 months ago
0
Added support for MuSiQue dataset, along with a small bug fix for generate

#19 fladhak closed 4 months ago
0
Added Dolomites and QMSum datasets.

#18 fladhak closed 4 months ago
1
Update Tasks to allow for train, val, test splits to be used.

#17 griff4692 closed 4 months ago
1
Adding code for evaluation, along with Squality and refernce-based metrics.

#16 fladhak closed 4 months ago
1
Merge branch main into gist-tokens

#15 uSaiPrashanth closed 4 months ago
0
Fix various bugs

#14 VikParuchuri closed 4 months ago
0
Added support for Qwen2 models.

#13 fladhak closed 4 months ago
2
Naive implementation of sparse Top-K attention approximation from FlexGen.

#12 griff4692 closed 3 months ago
0
Implement Scissorhands KV-cache compression & SnapKV prompt compression

#11 griff4692 closed 4 months ago
0
Implement Scissorhands paper as KVCacheScissorHands.

#10 griff4692 closed 4 months ago
0
Implements window KV-Cache Compression Strategy

#9 griff4692 closed 4 months ago
3
Implement Evaluations (Decide on datasets and benchmark initial methods)

#8 griff4692 closed 3 months ago
0
Record memory consumption

#7 griff4692 closed 4 months ago
0
Record Model Speed in evals

#6 griff4692 closed 4 months ago
0
Experiment with Fixed Global Tokens

#5 griff4692 closed 4 months ago
1
Profile Llama3 Attention Heads

#4 griff4692 closed 3 months ago
0
Compute Heavy Hitters KV-Cache Eviction Policy

#3 griff4692 closed 3 months ago
0
LLama3 GIST

#2 griff4692 closed 3 months ago
0
Benchmark prompt summarizers with frontier LLMs (GPT-4o / Opus)

#1 griff4692 closed 4 months ago
1