quantization Search Results

1000+ results
for quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #8197

[Feature]: BitsandBytes quantization with TP>1

### 🚀 The feature, motivation and pitch Any QLoRA adapters trained on large checkpoints (e.g., 70B) are unusable as we cannot use TP>1 to shard the model over multiple GPUs. Therefore, resolving this…

jvlinsta updated 1 month ago
1
vllm-project/vllm #7471

[Bug]: FP8 Quantization support for AMD GPUs

### Your current environment I am trying out FP8 support on AMD GPUs (MI250, MI300) and the vLLM library does not seem to support AMD GPUs yet for FP8 quantization. Is there any timeline for when thi…

rathnaum updated 2 weeks ago
2
InternLM/lmdeploy #2376

deploying Mixtral8x22B with quantization

### Motivation I wanted to deploy deploy Mixtral8x22B with quantization but it says that lmdeploy doesn't support Mixtral8x22B model. ### Related resources _No response_ ### Additional context _N…

zekih updated 3 months ago
3
vllm-project/llm-compressor #936

Does `one_shot` save model twice?

Hi thanks for the lib! When checking https://github.com/vllm-project/llm-compressor/issues/935, it seems that `one_shot` auto saves everything to the output folder. That looks great, but if I understa…

fzyzcjy updated 9 hours ago
7
ahrefs/ocannl #137

Implement support for quantization

The part that's a bit confusing is dynamic indexing. For consistency, the underlying integers still need to be scaled before becoming the indexing integers.

lukstafi updated 2 months ago
3
kijai/ComfyUI-CogVideoXWrapper #264

It seems that triton has no effect on cogvideo

I saw it compiled, it can increase 20% performance on flux, but it seems that it has no effect on cogvideo 1.5 the quantization is fp8, faster cache is enabled

dummyapps updated 6 days ago
7
xorbitsai/inference #2583

xoscar.errors.ServerClosed: [address=0.0.0.0:13276, pid=39] …

### System Info / 系統信息 SERVER:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz PRETTY_NAME:"Debian GNU/Linux 11 (bullseye)" python:3.11.5 conda:23.10.0 torch:2.4.1+cpu ### Running Xinference with D…

erliang-sf updated 1 day ago
1
vllm-project/vllm #9913

[Bug]: awq marlin error for deepseek v2 lite

### Your current environment vllm==0.6.3.post1 ### Model Input Dumps ```bash ValueError: Weight input_size_per_partition = 10944 is not divisible by min_thread_k = 128. Consider reducing tensor_pa…

TechxGenus updated 1 week ago
1
NVIDIA/TensorRT-LLM #1591

Feature Request: "Model Zoo" for quantization

TensorRT-LLM has great potential for allowing people to run larger models efficiently with limited hardware resources. Unfortunately, the current quantization workflow requires significant computation…

atyshka updated 2 weeks ago
7
huggingface/hub-docs #1476

[Bug] The specified tag is not a valid quantization scheme.

**Bug description.** When trying to pull a specific quantization tag for a model through Ollama I was getting the following error: `The specified tag is not a valid quantization scheme.` At first …

Mushoz updated 3 weeks ago
2

上一页 1...14 15 16 17 18 19 20...100 下一页

1000+ results for quantization

1000+ results
for quantization