fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

lllyasviel/stable-diffusion-webui-forge #1030

Forge upgrade issues for the latest 10-series and 20-series …

NF4 model 1024 X 1024 resolution 10 Series 20 Series 8G graphics card, running a picture to take four minutes

dddddssa updated 1 month ago
8
NVIDIA/TransformerEngine #1018

Error in NVRTC compilation

Hi! I'm getting the following error when trying to use Transformer Engine: 100 errors detected in the compilation of "transformer_engine/common/transpose/rtc/cast_transpose.cu". Compilation terminat…

tarunvallabh updated 2 months ago
5
predibase/lorax #387

Add support for fp8 (H100)

tgaddair updated 5 months ago
2
pytorch/pytorch #128381

FP16/BF16 - FP8 mixed matmul causes core dump

### 🐛 Describe the bug ```python import torch import torch._inductor.config torch._inductor.config.force_mixed_mm = True def f(a, b): return torch.mm(a, b.to(a.dtype)) fp16_act = torc…

gau-nernst updated 3 weeks ago
5
pytorch/pytorch #133242

Improve inductor codegen for writing out tensor and tensor.t…

### 🐛 Describe the bug ### 🚀 The feature, motivation and pitch This is moving "issue 2" from https://github.com/pytorch/pytorch/issues/130015 to be tracked separately. **Context**: While using…

vkuzo updated 1 month ago
2
triton-lang/triton #2513

Understanding Triton GEMM FP8 performance

Hello, we have measured the FP8 GEMM performance using Triton on NVIDIA H100 (500 W, 1980 MHz). We would like to request your help in understanding if the performance is expected. Since H100 FP8 o…

sryap updated 3 months ago
14
NVIDIA/TransformerEngine #966

tp_overlap need tensor parallel is equal world size ?

i want set tp size = 2 and the global world size = 2 the code is : ``` import os import sys import subprocess import argparse import torch import torch.distributed as dist import…

kuangdao updated 1 month ago
5
OpenNMT/OpenNMT-py #2304

fp8 support

If someone is motivated, there could be some adaptation to support fp8 (on some hardware) using this new library: https://github.com/NVIDIA/TransformerEngine cc: @guillaumekln @francoisher…

vince62s updated 1 year ago
6
comfyanonymous/ComfyUI #4528

I made an fp8 implementation of flux which gets ~3.5 it/s 10…

### Feature Idea Saw the claim on this reddit thread, hopefully the ideas there can also be brought into comfy for even more speedups. https://www.reddit.com/r/StableDiffusion/comments/1ex64jj/i_m…

Charuru updated 3 weeks ago
9
pytorch/ao #663

high throughput inference

Was chatting with @Chillee about our plans in AO today and he mentioned we should be focusing on a few concrete problems like 1. Demonstrate compelling perf for fp8 gemm at a variety of batch sizes. …

msaroufim updated 1 month ago
3

上一页 1...14 15 16 17 18 19 20...100 下一页

1000+ results for fp8

1000+ results
for fp8