fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Lightning-AI/lightning-thunder #486

FP8 Linear and conv with cudnn

## 🚀 Feature CuDNN provides flexible support for performant gemm/conv with fp8 quantization. Thunder introducing fp8 casts in its traces can benefit from cudnn fusions. ### Motivation Today, thu…

vedaanta updated 3 months ago
1
microsoft/DeepSpeed #5760

[BUG] I can't run fp8 with pipeline parallel

Hi, I am trying to use fp8 with TransformerEngine. I am using a version of GPT-Neox repo, which uses deepspeed. I can get fp8 to run in my MLPs with model parallel, but when I use pipeline paralle…

exnx updated 2 months ago
2
comfyanonymous/ComfyUI #4461

Flux Is very slow, and sometimes crashes in Comfy

### Your question I got: Total VRAM 8188 MB, total RAM 16011 MB pytorch version: 2.3.1+cu121 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4060 Laptop GPU : cudaMallocAsync …

Michlozz updated 3 weeks ago
3
NVIDIA/TransformerEngine #965

How to cast 16/32-bit to FP8?

Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.

mxjmtxrm updated 2 months ago
3
NVIDIA/TensorRT-LLM #1906

Ada `FP8xint4` performance issue

Since Ada GPUs like 4090 limit the FP8 arithmetic into `fp32` accumulation, it only achieve the same max `TFLOPs` compared to `fp16xfp16` with `fp16` accumulation. Further more, according to my test,…

jcao-ai updated 1 month ago
6
city96/ComfyUI-GGUF #14

Running with 12GB RAM (not VRAM)?

Is there a way to run these models with 12 GB RAM? With fp8 models it is working but with GGUF models it always fail.

GXcells updated 1 month ago
14
ModelTC/lightllm #479

question about fp8 version of context_flashattention_nopad.p…

[context_flashattention_nopad_fp16_fp8.txt](https://github.com/user-attachments/files/16421521/context_flashattention_nopad_fp16_fp8.txt) we have implemented a f8 version of context_flashattention_…

changyuanzhangchina updated 1 month ago
2
huggingface/optimum-habana #1321

Llama-2-70B FP8 quantization trust_remote_code=True not pass…

### System Info ```shell Optimum-habana v1.13.2 HL-SMI: hl-1.17.1-fw-51.5.0 Driver: 1.17.1-78932ae ``` ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks…

aalbersk updated 5 days ago
1
huggingface/optimum-habana #1073

Is there example of FP8 train LLM, pre-train or fine-tune

### Feature request I see the release version 1.12 has supported fp8, but I didn't see any example code for how to train LLM by using FP8. How can I use FP8 to train model? ### Motivation I want t…

harborn updated 3 months ago
1
filipstrand/mflux #45

Support for Different Samplers and Schdulers?

First of all, thanks for an amazing project! This runs on average 30% faster than flux on ComfyUI. I was wondering if there's any planned support for different schedulers and samplers like how you can…

andyw-0612 updated 4 days ago
1

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for fp8

1000+ results
for fp8