fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

AUTOMATIC1111/stable-diffusion-webui #15995

[Bug]: When FP8 is enabled, Only one lora applied

### Checklist - [X] The issue exists after disabling all extensions - [X] The issue exists on a clean installation of webui - [X] The issue is caused by an extension, but I believe it is caused b…

Bocchi-Chan2023 updated 3 weeks ago
15
chengzeyi/stable-fast #128

FP8 support in stable fast

Is it planned? Currently getting this error when trying to run ComfyUI in fp8 (flags `--fp8_e4m3fn-text-enc --fp8_e4m3fn-unet`): ``` RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'…

jkrauss82 updated 1 month ago
6
pytorch/FBGEMM #2713

FP8 Triton matmul code silently requires contiguous tensors

Hello! Thank you very much for this FP8 rowwise matmul code, it's been extremely helpful. However, there is a subtle bug/hidden requirement when eg. calling this code here: https://github.com/pytor…

rationalism updated 2 weeks ago
2
Dao-AILab/flash-attention #1008

Dose support kv cache is fp8 or int8 , but calculate is also…

Dose support kv cache is fp8 or int8 , but calculate is also fp16？read kvcashe by int8 is more fast by fp16, then in shaerd memory will convert int8 to fp16 and calculate.

KnightYao updated 1 week ago
1
NVIDIA/TensorRT-LLM #1849

Enough VRAM to run a model, but not enough to quantize

On my system, I have enough VRAM (72 GB) to run Llama-3-70B in 4-bit or 8-bit precision. However, I am unable to quantize this model to either 4-bit or 8-bit precision using the scripts in TensorRT-LL…

oobabooga updated 1 week ago
2
HazyResearch/ThunderKittens #23

[Feature Request] GEMM benchmarks and FP8 Support

I really like the simplicity of TK and think it could be broadly applicable to kernel authoring beyond attention. Has there been any benchmarking done of pure GEMM operations? If so, an example would …

jwfromm updated 1 month ago
7
NVIDIA/TensorRT-LLM #1913

Sparsity fp8 Llama-3-8b on RTX4090 has no speed improvement …

Hi! I tried Sparsity fp8 Llama-3-8b on RTX4090, but doesn't get performance improvement. I checked the trt-llm build log, which shows that depite there are layers eligible to use sparse tactics, they…

lishicheng1996 updated 5 hours ago
1
triton-lang/triton #2513

Understanding Triton GEMM FP8 performance

Hello, we have measured the FP8 GEMM performance using Triton on NVIDIA H100 (500 W, 1980 MHz). We would like to request your help in understanding if the performance is expected. Since H100 FP8 o…

sryap updated 1 month ago
14
kijai/ComfyUI-SUPIR #97

what does fp8_unet do?

what does fp8_unet do? Does that save vram if we enable fp8_unet?

MaxTran96 updated 3 months ago
1
predibase/lorax #387

Add support for fp8 (H100)

tgaddair updated 2 months ago
2

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for fp8

1000+ results
for fp8