efeslab Atom issues - Githubissues

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

259 stars 21 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Question about KV Cache quantization

#23 SherrySwift opened 2 weeks ago
3
Quention about end-to-end efficiency evaluation of Atom

#22 cokeshao closed 1 month ago
2
Is it possible to add support for other models?

#21 wlll123456 closed 1 month ago
1
Question about the synchronazation in low-precision kernel

#20 cat538 closed 2 months ago
2
TypeError: QLlamaDecoderLayer.forward() got an unexpected keyword argument 'cache_position'

#19 galenyu closed 2 months ago
2
LLM model load hanging problem

#18 jimmy-adams closed 2 months ago
2
Question regarding the efficiency evaluation

#17 FlyFoxPlayer closed 4 months ago
3
[Major] Add support for Mixtral8x7b

#16 cylinbao closed 5 months ago
0
The question about calib data

#15 ghost closed 5 months ago
3
How to load quantized weight?

#14 ghost closed 5 months ago
1
feat: adapt GPTQ to fp4 quantization

#13 happierpig closed 5 months ago
1
RuntimeError when quant llama model

#12 ghost closed 5 months ago
10
feat: add FP4 evaluations

#11 happierpig closed 5 months ago
0
porting SVD into Atom

#10 shadowpa0327 closed 6 months ago
0
AssertionError

#9 muzi0111 closed 6 months ago
1
error：same device

#8 muzi0111 closed 6 months ago
1
the ppl for llama-7b is very large

#7 priscilla-pan closed 7 months ago
3
not including dynamic quantizaiton when reproducing results, why?

#6 priscilla-pan closed 7 months ago
3
Adding OPT support for the simulated quantization.

#5 cylinbao closed 7 months ago
0
how to compare the performance with vllm/tgit/lightllm or other llm serving framework?

#4 irasin closed 8 months ago
3
ppl on ptb

#3 MrDoghead closed 8 months ago
2
issue with `c4` dataset for eval

#2 HamidShojanazeri closed 8 months ago
1
Update README.md

#1 eltociear closed 8 months ago
0