fp8 Search Results - Githubissues

1000+ results
for fp8

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

neuralmagic/AutoFP8 #36

CUDA out of memory when quantizing llama3.1-405b on 80GiBx8 …

``` Some parameters are on the meta device device because they were offloaded to the cpu. Quantizing weights: 0%| | 0/1771 [00:00

sfc-gh-zhwang updated 1 month ago
2
chengzeyi/stable-fast #128

FP8 support in stable fast

Is it planned? Currently getting this error when trying to run ComfyUI in fp8 (flags `--fp8_e4m3fn-text-enc --fp8_e4m3fn-unet`): ``` RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'…

jkrauss82 updated 4 months ago
6
black-forest-labs/flux #96

As a computer novice, how can I solve this problem?

Prompt outputs failed validation CheckpointLoaderSimple: - Value not in list: ckpt_name: 'flux1-schnell-fp8.safetensors' not in [] Prompt outputs failed validation CheckpointLoaderSimple: - R…

ccs779 updated 3 weeks ago
1
ROCm/AMDMIGraphX #2717

Convert OCP FP8 model to FNUZ model inside MIGraphX

### Problem Description ### Parsing OCP FP8 Model This would require MIGraphX to expose E4M3FN data type into the IR. Currently only E4M3FNUZ type is exposed. It is probably not a big work to expo…

umangyadav updated 1 month ago
5
yolain/ComfyUI-Easy-Use #325

使用新的fluxloader，无论是用flux1-dev-bnb-nf4-v2.safetensors，或者是flux1…

如题，已安装了ComfyUI_bitsandbytes_NF4插件。如果是加载flux1-schnell_fp8_unet_vae_clip模型会出现下面错误 ![image](https://github.com/user-attachments/assets/18127d10-29a2-44fc-a62c-0a29bd1fa0a6) ![image](https://github.co…

terliu updated 3 weeks ago
8
pytorch/ao #643

[BUG] Float8Linear does not work with torch.inference_mode

FP8 Linear does not work for me: > - torch == 2.4.0 + cu121 > - torchao == 0.4.0 > - cuda_arch == 8.9 (nvidia L40) ```python import torch import torch.nn as nn from torchao.float8 import conv…

leeeizhang updated 3 weeks ago
6
lllyasviel/stable-diffusion-webui-forge #1624

GGUF Q8_0 and Automatic Diffusion (8bit LoRa) generating poo…

Since ba01ad37, LoRas loaded in 8bit to the Q8_0 GGUF generate to a poor quality. Loading the LoRa in 16bit appears to fix this issue, but there are subtle differences in the generations from rounding…

blakejrobinson updated 1 week ago
4
lllyasviel/stable-diffusion-webui-forge #1030

Forge upgrade issues for the latest 10-series and 20-series …

NF4 model 1024 X 1024 resolution 10 Series 20 Series 8G graphics card, running a picture to take four minutes

dddddssa updated 1 month ago
8
huggingface/transformers #25333

Support H100 training with FP8 in Trainer and Deepspeed

### Feature request Support H100 training with FP8 in Trainer and Deepspeed ### Motivation FP8 should be much faster than FP16 on supported Hopper hardware. Particularly with Deepspeed integration …

michaelroyzen updated 1 month ago
22
huggingface/diffusers #9156

CudaOutOfMemory on Flux Lora Training

### Describe the bug I tried to train the flux-dev model with Lora on A100 40GB. But it raises the CudaOutOfMemory exception. ### Reproduction ``` # Accelerate command export MODEL_NAME="bl…

m-pektas updated 2 days ago
6

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for fp8

1000+ results
for fp8