Cornell-RelaxML quip-sharp issues

Cornell-RelaxML / quip-sharp

GNU General Public License v3.0

486 stars 42 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Question] code question in `quip.quantize_linear`

#75 jhss opened 1 week ago
1
How do you interact with my compressed weights

#74 hsb1995 opened 4 weeks ago
6
what the fuction of （finetune_decoder_layer） for？

#73 LiMa-cas opened 1 month ago
7
llama 3 8b results

#72 dorsa-zeinali opened 1 month ago
2
Can we run the kernels on V100 machines?

#71 mgholamikn opened 1 month ago
3
Question about the paper.

#70 jhss opened 1 month ago
2
Question about the ROUND operation

#69 CPegasus opened 1 month ago
1
incompatible checkpoint loading and saving.

#68 YilunKuang closed 1 month ago
2
Reproducing the Results

#67 mgholamikn closed 1 month ago
6
questions for # convert model to hf format for end to end fine tuning

#66 LiMa-cas closed 1 month ago
7
[WIP] Faster generation with CUDA graphs

#65 tsengalb99 closed 1 month ago
0
.m16n8k8 nned cuda sm8.0 or higher

#64 oreo0906 closed 1 month ago
3
throughput question

#63 leeglg closed 1 month ago
3
Memory requirement

#62 dorsa-zeinali closed 1 month ago
3
devset = utils.sample_rp1t(tokenizer, AttributeError: module 'lib.utils' has no attribute 'sample_rp1t'

#61 LiMa-cas closed 1 month ago
2
Question about the (L - I) term in LDLQ.

#60 KimythAnly closed 1 month ago
2
icml cleanup

#59 tsengalb99 closed 2 months ago
0
Is there a way to support tensor parallelism for inference?

#58 ChuanhongLi closed 2 months ago
1
Confused about the description - QuIP# is also the first PTQ method where 3-bit models scale better than 4-bit models

#57 ChuanhongLi closed 2 months ago
1
Question about the lenght of the sampled datas

#56 ChuanhongLi closed 3 months ago
3
[Question] The total time about Llama-3-70B quantization.

#55 ChenMnZ closed 4 months ago
1
Turn these scripts into a python package

#54 osbm opened 4 months ago
2
LLaMA-3 support and questions

#53 catid opened 5 months ago
5
[Question] How to reproduce QuIP# (No FT & No E_8)

#52 ChenMnZ closed 1 month ago
2
[Question] Why only quantize an individual linear layer during block-wise optimization of fine-tunings.

#51 ChenMnZ closed 5 months ago
2
[Question] Word Embedding Quantization?

#50 4eyes4u closed 6 months ago
1
Package and Release on (Test)PyPI

#49 nalzok opened 6 months ago
1
[Question] Why discard the last element of logit.

#48 ChenMnZ closed 6 months ago
2
[Question] Different target in the e2e finetuning

#47 ChenMnZ closed 6 months ago
2
model size confirmation

#46 ysong2123 closed 7 months ago
3
Procedures for quantizing generic architectures

#45 ad8e closed 7 months ago
1
Group-wise Quantization

#44 arman-kazemi closed 7 months ago
1
NameError: name 'quant_emb' is not defined

#43 eadst closed 7 months ago
5
better sharding

#42 tsengalb99 closed 7 months ago
0
Release20240212

#41 tsengalb99 closed 7 months ago
0
Pytorch dequantization

#40 JeevanBhoot closed 7 months ago
3
Llama-2-7b-E8P-2Bit not loading correctly.

#39 YilunKuang closed 8 months ago
1
How many samples do you use in checkpoints?

#38 YangWang92 closed 8 months ago
2
faster 1 bit kernel

#37 tsengalb99 closed 8 months ago
0
TypeError: decompress_e8p_origorder(): incompatible function arguments.

#36 KnutJaegersberg closed 8 months ago
10
HF Mistral-7B and Llama 2 7b chat Not working.

#35 kklivil closed 8 months ago
10
patch c4

#34 tsengalb99 closed 8 months ago
0
release20240109

#33 tsengalb99 closed 8 months ago
0
ROCm Build Error

#32 lufixSch opened 9 months ago
3
Exception: Saved weights version (0) does not match the codebook version (1).

#31 KnutJaegersberg closed 9 months ago
6
llamafied model have some issues happening in hfize_llama.py

#30 Minami-su closed 9 months ago
22
Question about error proxy in show_metrics

#29 YangWang92 closed 9 months ago
6
In the same vein as #17

#28 vince62s closed 9 months ago
1
There are some issues when I try to run the Yi34b model with 2bits quant

#27 Rashomon-Chinglo closed 9 months ago
4
distribute the memory usage evenly across both cards?

#26 Minami-su closed 9 months ago
2