quantization Search Results

1000+ results
for quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch-labs/gpt-fast #194

Activation quantization support

Many papers have recently addressed the issues with quantization of activations for LLMs. Examples: https://github.com/ziplab/QLLM?tab=readme-ov-file#%F0%9F%9B%A0-install https://github.com/mit-h…

ayyoobimani updated 2 months ago
1
NVIDIA/TensorRT-Model-Optimizer #102

Whether fp8 quantization supports the the DIT module？

Rudin6 updated 2 weeks ago
1
vllm-project/llm-compressor #865

Output of Compressor unable to be to be loaded by latest HF …

**Describe the bug** When using the preset W8A8 recipe from llm-compressor, the output results in a model config.json that fails validation when loaded by HF Transformers. This is a dev version of Tr…

hyaticua updated 4 weeks ago
2
vllm-project/llm-compressor #831

When to support multi-nodes quantization?

For large models (Parameter > 350B)，weight can't be loaded with single node (ex: 80GB X 8). Although methods like cpu/disk offloading can overcome the limits of GPU memory, but the quantization spee…

IEI-mjx updated 1 month ago
2
pytorch-labs/gpt-fast #12

GPTQ quantization not working

Running ``quantize.py`` with ``--mode int4-gptq`` does not seem to work: - code tries to import ``lm-evaluation-harness`` which is not included/documented/used - import in ``eval.py`` is incorrect…

lopuhin updated 1 week ago
17
ml-explore/mlx #135

Thoughts on Quantization Roadmap

I'm new to this specific project, and I don't say any of the following with high confidence. Things that I see as important for quantization: *Inference speed* - AWQ seems best on this front, t…

RonanKMcGovern updated 3 weeks ago
4
marcoslucianops/DeepStream-Yolo-Seg #17

INT8 quantization support

Hi. Is there any support for converting the YoloV8-seg model to INT8 precision and using it with Deepstream?

dhruvsasuke updated 1 month ago
1
pytorch/ao #1204

[New method] VPTQ Vector Post-Training Quantization Support

Hi all, We've recently open-sourced a new quantization method. VPTQ (Vector Post-Training Quantization) is a novel Post-Training Quantization method that leverages Vector Quantization to achieve hi…

YangWang92 updated 3 weeks ago
2
THUDM/CogVideo #509

ImportError: cannot import name 'ActivationCasting' from 'to…

### System Info / 系統信息 torch 2.5.1+cu121 diffusers 0.31.0 torchao 0.7.0+cpu Python 3.11.10 Windows 11 ### Information / 问题信息 - [X] The official example scr…

nitinmukesh updated 5 days ago
1
vllm-project/vllm #9901

[Bug]: 我在使用factory_llama工具以qlora的方式训练Qwen/Qwen2.5-1.5B-Instr…

### Your current environment """ This example shows how to use LoRA with different quantization techniques for offline inference. Requires HuggingFace credentials for access. """ import gc …

gaojuntian updated 3 weeks ago
1

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for quantization

1000+ results
for quantization