-
I use the following code to convert the internal state model into TFLite
```
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
conv…
-
TensorRT-LLM has great potential for allowing people to run larger models efficiently with limited hardware resources. Unfortunately, the current quantization workflow requires significant computation…
-
I'd like to raise a concern about how quantization is currently handled in SpeechBrain. While training my own k-means quantizer on the last layer of an ASR model, I noticed that the interface was not …
-
%%capture
!pip install unsloth "xformers==0.0.28.post2"
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://gi…
-
https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8
https://github.com/vllm-project/llm-compressor/tre…
-
Hello authors,
Thank you for your excellent work.
I've tried utilizing AIMET to resolve a severe performance degradation issue caused by quantization while using the SNPE library. However, I've …
-
generate image code detail
```python
from diffusers import FluxTransformer2DModel
import torch
def load_flux_model(
model_path: str,
load_from_file: bool = True,
dtype: …
-
### Search before asking
- [x] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
## 🐛 Bug
I'm looking at generating a int8 quantised PyTorch model (both weights and activations at int8), and exporting to StableHLO via `torch-xla`'s `exported_program_to_stablehlo`.
Right no…
-
### Description of the bug:
I tried running the example.py script given for quantization example, but for Llama. Wherever the reference to Gemma was made, I made appropriate references to Llama. The…