quant Search Results - Githubissues

vllm-project/vllm #8716

[RFC]: quant llm from alpindale

### Motivation. Higher throughput und memory savings are always cool 😎 I think that could be integrated very easily, what do you think about it's design ? ### Proposed Change. https://github.com…

flozi00 updated 5 days ago

turboderp/exllamav2 #587

Error in quant

When converting [nemolita-21b](https://huggingface.co/win10/nemolita-21b), which is a merged model, the `convert.py` runs into this error: ```shell Traceback (most recent call last): File "/hom…

Orion-zhen updated 2 weeks ago

pytorch/ao #579

Background: The [spin quant paper](https://arxiv.org/pdf/2405.16406) introduces a method of improving quantization by adding additional rotation matrices to the model weights that improve quantizatio…

HDCharles updated 1 hour ago

YoungHaKim7/Cpp_Training #13

Quant trading

Small Image(ㅈㅓ작권 조심) ![quant_img2](https://github.com/YoungHaKim7/Cpp_Training/assets/67513038/7e4ac027-f6ca-4679-a09b-982431447afa)

YoungHaKim7 updated 4 days ago

NVIDIA/TensorRT-Model-Optimizer #72

Quant Flux-dev OOM on L20

How many GPU memory will be used to quant flux-dev ? Can be offload to cpu when not enough GPU memory ? The following part of your input was truncated because CLIP can only handle sequences up to 77…

hezeli123 updated 3 days ago

gongouveia/Resnet-Quantization-Experiments #1

When I try to run this project, I find there is a bug in qua…

# Quantize the model model_prepared = tq.prepare(model_fused) model_quantized = tq.convert(model_prepared) # Define the quantization configuration quant_config = tq.get_default_qconfig('fbge…

DomineeringDragon updated 4 days ago

pachterlab/kallisto #463

lr-kallisto quant-tcc seg fault with bulk ONT

Version: `kallisto 0.51.1` I'm following a workflow outlined in [issue 456](https://github.com/pachterlab/kallisto/issues/456) for using lr-kallisto with bulk ONT. `kallisto bus`, `bustools sort`, …

sbresnahan updated 2 days ago

pytorch/executorch #5038

The `embedding_4bit` implementation has a strong assumption …

### 🐛 Describe the bug In the `embedding_4bit` implementation [here](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py#L213), it assumes the quantized da…

junpeiz updated 3 weeks ago

city96/ComfyUI-GGUF #79

Question about the Q8_0 quants

@city96 I noticed that the data in the flux dev and schnell Q8_0 ggufs are in f16/q8_0, but shouldn't it be f32/q8_0? Flux in Q8_0: ![image](https://github.com/user-attachments/assets/4a0d6f16-882…

RandomGitUser321 updated 1 month ago

lllyasviel/stable-diffusion-webui-forge #1270

Q6_K gguf quant not working

Get the following error when starting inference: `Traceback (most recent call last): File "H:\forge\webui\modules_forge\main_thread.py", line 30, in work self.result = self.func(*self.args,…

delijoe79 updated 3 weeks ago

1000+ results
for quant