fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

TencentQQGYLab/ComfyUI-ELLA #15

fp8 is not supported

The image generated in f8 mode will report an error Error occurred when executing T5TextEncode #ELLA: "index_select_cuda" not implemented for 'Float8_e5m2'

emptybamboo updated 1 month ago
6
NVIDIA/TensorRT-LLM #1539

use_fp8_context_fmha broken outputs

### System Info CPU architecture: x86_64 Host RAM: 1TB GPU: 8xH100 SXM Container: Manually built container with TRT 9.3 Dockerfile.trt_llm_backend TensorRT-LLM version: 0.10.0.dev2024043000 Dr…

siddhatiwari updated 1 week ago
20
microsoft/onnxruntime #21090

[Feature Request] support FP8 calibraion method and quantiza…

Describe: I set weight/activation with QuantType.QFLOAT8E4M3FN when calling quantize_static, but I get the following errors: ```` Traceback (most recent call last): File "/home/developer/wor…

zccyman updated 2 weeks ago
2
NVIDIA/TransformerEngine #966

tp_overlap need tensor parallel is equal world size ?

i want set tp size = 2 and the global world size = 2 the code is : ``` import os import sys import subprocess import argparse import torch import torch.distributed as dist import…

kuangdao updated 6 days ago
4
NVIDIA/TransformerEngine #962

nan loss when training in fp8 with rotary embedding

Loss in nan in the first batch of training itself when transformer architecture uses [rotary embedding](https://github.com/lucidrains/rotary-embedding-torch)

saurabh-kataria updated 1 week ago
2
NVIDIA/TensorRT-LLM #1741

Quantizing Phi-3 128k Instruct to FP8 fails.

### System Info - GPU name: L40s - CUDA: 12.1 ``` Wed Jun 5 16:27:21 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 …

kalradivyanshu updated 4 days ago
10
vllm-project/vllm #5781

[Feature]: Use 64-bit integers as indices in cuda kernels

### 🚀 The feature, motivation and pitch I found that some kernels use 32-bit integers as indices, which can easily lead to overflow. I think change them into int64_t (or other 64bit types) will be sa…

courage17340 updated 2 weeks ago
1
NVIDIA/TransformerEngine #922

How to use FP8 of TransformerEngine in inference

ENV: RTX 8*4090 I want to test FP8 of TransformerEngine in llama3 (from huggingface) for inference. I can not find instructions on inference. Can you share some code? Thank you~

Godlovecui updated 2 weeks ago
2
vllm-project/vllm #448

TE FP8 support?

Hi! Is adding FP8 transformer engine (H100) speedup to inference planned? If not, could you please give me an outline of what needs to be done in order for me to work on that? Thank you!

SinanAkkoyun updated 2 months ago
5
foundation-model-stack/foundation-model-stack #229

Optimized FP8 GEMM Kernel [MEGATHREAD]

SOTA (CUBLAS, CUTLASS) FP8 GEMM kernels are performing poorly for small M (bs*seq_len) < 32 regime. This work will focus on leveraging the performant pieces of the [Marlin](https://github.com/IST-D…

AdnanHoque updated 2 days ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for fp8

1000+ results
for fp8