fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/Megatron-LM #883

[QUESTION] What's the internal difference for training when …

**Your question** I'm trying to train GPT/LLAMA on top of Megatron-LM, but confused on fp8 performance. Setting fp8 format parameters together with "--bf16" is much better than the situation witho…

dong-liuliu updated 5 days ago
1
Dao-AILab/flash-attention #985

Does flash attention support FP8?

FP8 is very useful in training or inference in LLM. Does flash attention support FP8? Thank you~

Godlovecui updated 1 week ago
4
NVIDIA/cutlass #1605

[QST]

**We see that for FP8 GEMM only TNN is supported in the cutlass_prolifer generated kernels and in the examples directory cutlass as well. Are there any fp8 kernel with other layouts like TTT/TTN shipp…

ganeshcolfax updated 1 week ago
3
yl4579/StyleTTS2 #248

FP8 Fine Tuning Crashes

I get this error message if I set the max_len to 300 or any higher than 100 for that matter whenever I'm training to train with FP8. I'm using cuda-12.4.0-2 and the nightly cuda 12.4 pytorch builds an…

GUUser91 updated 1 month ago
1
google/praxis #68

Support for loading fp8 checkpoint

There is a use_fp flag for the offline_quantize tool in saxml/tool to quantize the weight in fp8 but still has to be stored in int8(https://github.com/google/praxis/blob/3f4cbb4bcda366db7b018695fbe2d4…

wenscarl updated 3 weeks ago
12
NVIDIA/TransformerEngine #965

How to cast 16/32-bit to FP8?

Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.

mxjmtxrm updated 1 week ago
3
huggingface/optimum-habana #1073

Is there example of FP8 train LLM, pre-train or fine-tune

### Feature request I see the release version 1.12 has supported fp8, but I didn't see any example code for how to train LLM by using FP8. How can I use FP8 to train model? ### Motivation I want t…

harborn updated 3 weeks ago
1
NVIDIA/TensorRT-LLM #1810

Is it "INT8 or FP8" with "--use_weight_only --weight_only_pr…

### System Info GPU - A10 ### Who can help? @Tracin ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially supported task in the `…

aiiAtelier updated 2 weeks ago
1
NVIDIA/TensorRT-LLM #1906

Ada `FP8xint4` performance issue

Since Ada GPUs like 4090 limit the FP8 arithmetic into `fp32` accumulation, it only achieve the same max `TFLOPs` compared to `fp16xfp16` with `fp16` accumulation. Further more, according to my test,…

jcao-ai updated 16 hours ago
1
NVIDIA/TensorRT-Model-Optimizer #22

how to quantize onnx to fp8?

Hi again, I've successfully quantized an onnx model to int8, then converted to tensorrt engine and noticed the performance increase compared to fp16. ```bash python -m modelopt.onnx.quantizati…

yuvraj108c updated 7 hours ago
5

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for fp8

1000+ results
for fp8