-
### Checklist
- [X] The issue exists after disabling all extensions
- [X] The issue exists on a clean installation of webui
- [X] The issue is caused by an extension, but I believe it is caused b…
-
Is it planned?
Currently getting this error when trying to run ComfyUI in fp8 (flags `--fp8_e4m3fn-text-enc --fp8_e4m3fn-unet`):
```
RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'…
-
Hello! Thank you very much for this FP8 rowwise matmul code, it's been extremely helpful. However, there is a subtle bug/hidden requirement when eg. calling this code here:
https://github.com/pytor…
-
Dose support kv cache is fp8 or int8 , but calculate is also fp16?read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.
-
On my system, I have enough VRAM (72 GB) to run Llama-3-70B in 4-bit or 8-bit precision. However, I am unable to quantize this model to either 4-bit or 8-bit precision using the scripts in TensorRT-LL…
-
I really like the simplicity of TK and think it could be broadly applicable to kernel authoring beyond attention. Has there been any benchmarking done of pure GEMM operations? If so, an example would …
-
Hi!
I tried Sparsity fp8 Llama-3-8b on RTX4090, but doesn't get performance improvement. I checked the trt-llm build log, which shows that depite there are layers eligible to use sparse tactics, they…
-
Hello, we have measured the FP8 GEMM performance using Triton on NVIDIA H100 (500 W, 1980 MHz). We would like to request your help in understanding if the performance is expected.
Since H100 FP8 o…
sryap updated
1 month ago
-
what does fp8_unet do? Does that save vram if we enable fp8_unet?
-