quantizing Search Results

1000+ results
for quantizing

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

irthomasthomas/undecidability #641

Guide to choosing quants and engines : r/LocalLLaMA

- [ ] [Guide to choosing quants and engines : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/1anb2fz/comment/kprbduc/) # Guide to choosing quants and engines : r/LocalLLaMA **DESCRIPTIO…

irthomasthomas updated 6 months ago
1
ridgerchu/matmulfreellm #18

Ternary weight values

Looking at the weight values, we see that they are bfloat16. Further, conversion to ternary is done at run-time (in FusedBitLinear). To see if the model still worked with ternary weights, I re-wro…

dmahurin updated 3 months ago
5
Vahe1994/AQLM #46

How model_seqlen affects quantization quality

Hi! Thanks for such a useful tool! I have a question about `model_seqlen`: As I can see default value in main.py is 4096. What if I'll use a smaller values e.g. 1024 when quantizing MoE mixtral m…

VirtualRoyalty updated 4 months ago
6
hiyouga/LLaMA-Factory #3023

rtx-4090多卡推理（模型为qlora微调后qwen72b）是否支持？通过FSDP+QLoRA，可以正常对qwen-…

rtx-4090多卡推理（模型为qlora微调后qwen72b）是否支持？通过FSDP+QLoRA，可以正常对qwen-72b的模型进行微调，想问一下，如何使用rxt-4090对其进行推理部署呢？我尝试使用如下的脚本进行多卡推理： ``` CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch --config_file fsdp_config.y…

ConniePK updated 2 months ago
7
spcl/QuaRot #3

Questions about reproduction of weight-only quantization.

Dear Authors, Thanks for your outstanding work. I like it and have learned a lot from it! I try to reproduce the weight-only quantization results in Table 5. However, I obtained some results tha…

ChenMnZ updated 2 months ago
6
turboderp/exllamav2 #479

Support MiniCPM architecture

First of all, thank you for your work. I've been able to efficiently run many models locally using exllamav2, which is a highly efficient inference library. Recently, I tried to use exllamav2-0.0.21 …

meigami0 updated 3 months ago
5
czhu95/ternarynet #7

Reproduced ResNet18 CIFAR10 result is 10% lower than reporte…

Hi @czhu95 , Thanks for providing the codes! Recently I use your codes to ternarize a ResNet18 using CIFAR10. Firstly I use tensorpack to train a ResNet18 to validation error as 0.083. However, …

csyhhu updated 5 years ago
6
chu-tianxiang/QuIP-for-all #8

Support hqq model on vllm-gptq

Because vllm-gptq does not open issue,so I raise issue here. https://mobiusml.github.io/hqq_blog/ HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super …

Minami-su updated 5 months ago
1
vllm-project/vllm #2251

Load Mixtral 8x7b AWQ model failed

I am using the latest vllm docker image, trying to run Mixtral 8x7b model quantized in AWQ format. I got error message as below: ``` INFO 12-24 09:22:55 llm_engine.py:73] Initializing an LLM engine …

thiner updated 6 months ago
28
pytorch/torchchat #565

[LAUNCH BLOCKER] Llama3 8B Instruct model hangs on chat

(.venv) (base) mikekg@mikekg-mbp torchchat % # Llama 3 8B Instruct python3 torchchat.py chat llama3 zsh: command not found: # Using device=cpu Apple M1 Max Loading model... Time to load model: 10…

mikekgfb updated 4 months ago
2

上一页 1...84 85 86 87 88 89 90...100 下一页

1000+ results for quantizing

1000+ results
for quantizing