fp8 Search Results - Githubissues

pytorch/pytorch #129457

[PT2][fp8][FSDP2] compile the function that pre-computes fp8…

### 🚀 The feature, motivation and pitch share repro for @bdhirsh , @tugsbayasgalan on the gaps of torch.compile for FSDP2 fp8 all-gather for FSDP2 fp8 all-gather, it's criticial to pre-compute ama…

weifengpy updated 3 days ago

microsoft/onnxruntime #20869

Gemm fp8 run error

### Describe the issue when i use gemm_float8 to run with input A(fp8 e5m2), input B(fp8 e4m3), can not run, but input A(fp8 e4m3), input B(fp8 e4m3) will run right, ### To reproduce run gemm_floa…

KnightYao updated 1 day ago

neuralmagic/AutoFP8 #23

Can AutoFP8 quantized MOE model inferenced with vlllm?（kv_ca…

I have seen that the AutoFP8 quantized models from Huggingface, especially Mixtral-8x7B-FP8 is supported by vllm. I am wondering if both kv_cache and weight quantized models quantized by AutoFP8 are …

IEI-mjx updated 2 days ago

NVIDIA/TensorRT-Model-Optimizer #34

undefined symbol in SD FP8 workflow

### Env - Inside docker, `nvcr.io/nvidia/pytorch:24.06-py3` - L20 GPU, Driver Version: 550.90.07 CUDA Version: 12.4 - TensorRT 10.1.0 ### Steps 1 make plugins and copy `plugins` folder…

chrjxj updated 1 day ago

facebookresearch/xformers #1058

Does xformers support FP8?

# 🚀 Feature FP8 is very useful in training or inference in LLM. Does xformers support FP8? Thank you~

Godlovecui updated 3 weeks ago

huggingface/text-generation-inference #2157

Enable torch.float8_e5m2 for Ada lovelace

https://github.com/huggingface/text-generation-inference/blob/d0225b10156320f294647ac676c130d03626473d/server/text_generation_server/layers/fp8.py#L4 @Narsil what do you think about enabling torch.…

flozi00 updated 2 days ago

NVIDIA/Megatron-LM #883

[QUESTION] What's the internal difference for training when …

**Your question** I'm trying to train GPT/LLAMA on top of Megatron-LM, but confused on fp8 performance. Setting fp8 format parameters together with "--bf16" is much better than the situation witho…

dong-liuliu updated 2 days ago

huggingface/transformers #23660

FP8 inference and FP8 KV cache

### Feature request Hi! Could anyone please help me with using HuggingFace models (LLaMa [or if LLaMa is difficult, MPT-7b]) with the TransformerEngine TE FP8 inference? We really need the speedup …

SinanAkkoyun updated 1 week ago

Dao-AILab/flash-attention #985

Does flash attention support FP8?

FP8 is very useful in training or inference in LLM. Does flash attention support FP8? Thank you~

Godlovecui updated 1 week ago

NVIDIA/cutlass #1605

[QST]

**We see that for FP8 GEMM only TNN is supported in the cutlass_prolifer generated kernels and in the examples directory cutlass as well. Are there any fp8 kernel with other layouts like TTT/TTN shipp…

ganeshcolfax updated 6 days ago

1000+ results for fp8

1000+ results
for fp8