-
### Motivation.
Higher throughput und memory savings are always cool 😎
I think that could be integrated very easily, what do you think about it's design ?
### Proposed Change.
https://github.com…
-
When converting [nemolita-21b](https://huggingface.co/win10/nemolita-21b), which is a merged model, the `convert.py` runs into this error:
```shell
Traceback (most recent call last):
File "/hom…
-
Background:
The [spin quant paper](https://arxiv.org/pdf/2405.16406) introduces a method of improving quantization by adding additional rotation matrices to the model weights that improve quantizatio…
-
Small Image(ㅈㅓ작권 조심)
![quant_img2](https://github.com/YoungHaKim7/Cpp_Training/assets/67513038/7e4ac027-f6ca-4679-a09b-982431447afa)
-
How many GPU memory will be used to quant flux-dev ?
Can be offload to cpu when not enough GPU memory ?
The following part of your input was truncated because CLIP can only handle sequences up to 77…
-
# Quantize the model
model_prepared = tq.prepare(model_fused)
model_quantized = tq.convert(model_prepared)
# Define the quantization configuration
quant_config = tq.get_default_qconfig('fbge…
-
Version: `kallisto 0.51.1`
I'm following a workflow outlined in [issue 456](https://github.com/pachterlab/kallisto/issues/456) for using lr-kallisto with bulk ONT. `kallisto bus`, `bustools sort`, …
-
### 🐛 Describe the bug
In the `embedding_4bit` implementation [here](https://github.com/pytorch/executorch/blob/main/exir/passes/_quant_patterns_and_replacements.py#L213), it assumes the quantized da…
-
@city96 I noticed that the data in the flux dev and schnell Q8_0 ggufs are in f16/q8_0, but shouldn't it be f32/q8_0?
Flux in Q8_0:
![image](https://github.com/user-attachments/assets/4a0d6f16-882…
-
Get the following error when starting inference:
`Traceback (most recent call last):
File "H:\forge\webui\modules_forge\main_thread.py", line 30, in work
self.result = self.func(*self.args,…