-
```
Some parameters are on the meta device device because they were offloaded to the cpu.
Quantizing weights: 0%| | 0/1771 [00:00
-
Is it planned?
Currently getting this error when trying to run ComfyUI in fp8 (flags `--fp8_e4m3fn-text-enc --fp8_e4m3fn-unet`):
```
RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'…
-
Prompt outputs failed validation
CheckpointLoaderSimple:
- Value not in list: ckpt_name: 'flux1-schnell-fp8.safetensors' not in []
Prompt outputs failed validation
CheckpointLoaderSimple:
- R…
-
### Problem Description
### Parsing OCP FP8 Model
This would require MIGraphX to expose E4M3FN data type into the IR. Currently only E4M3FNUZ type is exposed. It is probably not a big work to expo…
-
如题,已安装了ComfyUI_bitsandbytes_NF4插件。
如果是加载flux1-schnell_fp8_unet_vae_clip模型会出现下面错误
![image](https://github.com/user-attachments/assets/18127d10-29a2-44fc-a62c-0a29bd1fa0a6)
![image](https://github.co…
-
FP8 Linear does not work for me:
> - torch == 2.4.0 + cu121
> - torchao == 0.4.0
> - cuda_arch == 8.9 (nvidia L40)
```python
import torch
import torch.nn as nn
from torchao.float8 import conv…
-
Since ba01ad37, LoRas loaded in 8bit to the Q8_0 GGUF generate to a poor quality. Loading the LoRa in 16bit appears to fix this issue, but there are subtle differences in the generations from rounding…
-
NF4 model 1024 X 1024 resolution 10 Series 20 Series 8G graphics card, running a picture to take four minutes
-
### Feature request
Support H100 training with FP8 in Trainer and Deepspeed
### Motivation
FP8 should be much faster than FP16 on supported Hopper hardware. Particularly with Deepspeed integration …
-
### Describe the bug
I tried to train the flux-dev model with Lora on A100 40GB. But it raises the CudaOutOfMemory exception.
### Reproduction
```
# Accelerate command
export MODEL_NAME="bl…