quantization Search Results

1000+ results
for quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

QwenLM/Qwen2-VL #522

cannot do awq quantization on qwen 2vl 7b

Hi there, I was struggling on how to implement quantization on autoawq as you mentioned in home page. I was trying to quantize 7b qwen2 vl but no matter I use 2 A100 80Gb vram, I still get cuda oom…

lebronjamesking updated 1 week ago
4
rapidsai/cuvs #107

[FEA] Product quantization API

We need a separate product quantization API that is decoupled from IVF but can still be composed into IVF. Ideally this API would follow FAISS or Scikit-learn'a transformer estimators.

cjnolet updated 2 days ago
2
pytorch/ao #1195

Add codebook (look up table based) quantization flow in torc…

Similar to affine quantization, we can implement codebook or look up table based quantization, which is another popular type of quantization, especially for lower bits like 4 bits or below (used in ht…

jerryzh168 updated 1 week ago
2
state-spaces/mamba #133

Quantization

Hi, Have you tried quantizing Mamba? Do you plan on releasing quantized versions? Can you share your thoughts on quantizing Mamba, given the sensitivity of the model's recurrent dynamics? Thanks

arman-kazemi updated 3 months ago
6
vllm-project/vllm #2871

HQQ quantization support

As we have a few models with Half-Quadratic Quantization (HQQ) out there, VLLM should also support them: ```sh api_server.py: error: argument --quantization/-q: invalid choice: 'hqq' (choose from …

max-wittig updated 3 weeks ago
8
lancedb/lancedb #1822

Feature: Any plans to support Binary Vector Embeddings?

### SDK Python ### Description - From https://huggingface.co/blog/embedding-quantization: _Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval_ - Also from https…

asmith26 updated 2 days ago
1
michaelfeil/infinity #424

Abstraction for `resolve_torch_dtype_device(dtype: Dtype, de…

### Feature request Too much boilerplate template: Resolves loading, quantization, and device Eg. if device: auto -> torch.cuda.is_available() -> cuda or mps. dtype: float32 -> float32, no q…

michaelfeil updated 1 month ago
1
intel/neural-compressor #1972

Quantization failed

https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only bash run_quant.sh --input_model=./Meta-Llama-3.1-8B -…

endomorphosis updated 3 months ago
1
Hsu1023/DuQuant #8

Question about reproducing w4a4 quantization on Vicuna-7b-v1…

Dear author, when I reproduce the w4a4 quantization on Vicuna-7b-v1.5 on a single A800 by using the default parameters in run.sh,I got ``` ***** 0-shot ***** ***** MMLU_eval subcategories metrics …

Wotoosh updated 1 day ago
1
PygmalionAI/aphrodite-engine #792

[New Method]: VPTQ, Vector Post-Training Quantization

### The quantization format Hi all, We have recently designed and open-sourced a new method for Vector Quantization called Vector Post-Training Quantization (VPTQ). Our work is available at [VPTQ…

YangWang92 updated 3 weeks ago
2

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for quantization

1000+ results
for quantization