-
## 🐛 Bug
When FP8 quantization is applied to the TinyLlama-1.1B-Chat-v1.0 model, the model responses are exclusively composed of the simplified chinese character `给`. ` mlc chat` with the quantized…
-
FP8 Model with Clip, TextEncoder and Vae loads per each generation from zero
-
Is there any possibility to add fp8 support to all other parsers?
[comfyanonymous/ComfyUI#2157](https://github.com/comfyanonymous/ComfyUI/issues/2157)
Thanks!
-
Running Comfy with zluda. System is Windows 10 Ryzen 9 3900x, 32gb ram, Rx5700XT 8GB vram. Each time I try to run an NF4 model, I get the error below, and Comfy just closes. FP8 models run fine. I…
-
I am trying the vlm_ptq by following the readme in vlm_ptq folder, and when I call a command "scripts/huggingface_example.sh --type llava --model llava-1.5-7b-hf --quant fp8 --tp 8", following error m…
-
# Release Manager
@cp5555
# Endgame
- [x] Code freeze: Feb. 9th, 2024
- [x] Bug Bash date: Feb. 12th, 2024
- [x] Release date: Feb. 23rd, 2024
# Main Features
## MS-AMP O3 Optimization
-…
-
please add a support flux+kolors model........thanks
-
Hi,
when I try to implement cuBLASLt FP8 batched gemm with bias based on LtFp8Matmul, I met this problem.
```
[2024-05-22 07:06:23][cublasLt][62029][Error][cublasLtMatmulAlgoGetHeuristic] Failed t…
-
Hoi, When testing the wrapper i got results which faded to black and had a raster as you can see in the mp4s ..What could cause this? I am running this on 384p, fp8, both imagetovid as texttovid with …
mkat2 updated
3 weeks ago
-
Hi experts, I tried to use Transformer Engine to detects flops that 4090 can achieve using fp8.I used te.Linear for my evaluation and got a maximum TFLOPS of only 150+。For fp16, the maximum is only 80…