Open MagyTheMage opened 3 months ago
Same for me.
using in PyCharm 2024.1.1
(Professional Edition) Build #PY-241.15989.155, built on April 29, 2024
with Runtime version: 17.0.10+1-b1207.14 amd64
and VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o.
There is a known issue with SDXL VAE producing NaNs in fp16. Most fixes I've seen around involve forcing Automatic1111 into using full precision, however there are performance drawbacks to this approach.
You can fix the NaN issue more directly by using this fixed half-precision VAE here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
You can follow the instructions to install and switch to this VAE.
There is a known issue with SDXL VAE producing NaNs in fp16. Most fixes I've seen around involve forcing Automatic1111 into using full precision, however there are performance drawbacks to this approach.
You can fix the NaN issue more directly by using this fixed half-precision VAE here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
You can follow the instructions to install and switch to this VAE.
But the thing is I was just inpainting fine, and then this started happening out of nowhere. Same session, so what the heck broke in the blink of an eye?
Started happening when I started using an inpainting model which ive used before but for a few days. But its main model ive been using for last month. Has bult in VAE is a pony XL
Checklist
What happened?
Whenever attempting to generate an image, there is a seemingly random chance that an error appears stating the following:
"NansException: A tensor with all NaN's was produced in Unet. This could be either because there is not enough precision to represent the picture of because your video card does not support the half type, try setting "Upcast cross attention layer to float 32" option in settings >Stable difussion or using the --no-half commandline to fix this, Use --disable-nan-check commandline argument to disable this check"
Steps to reproduce the problem
What should have happened?
WebUI should have begun to generate the image, however it didnt, note that based on my testing it seems that the longer the prompt the more likely it is to happen, longer prompts simply failing to work even after repeated attempts
What browsers do you use to access the UI ?
Other
Sysinfo
sysinfo-2024-05-23-17-46.json
Console logs
Additional information
Although my PC is relatively low end and somewhat old (Currently trying to work towards an upgrade) i have used stable difussion for about a year if not more now and it has never had any issues up until a few months ago where this issue started happening, i have not experienced other issues with the computer including using other generative AI such as Voice.AI and KoboldCCP
The error seems heavily in inconsistent as repetedly clicking on the generate button may eventually cause the image to be generated, seems like its a 50/50 chance wether it errors out or not.
Using --disable-nan-check will allow the error not to happen, however the image may or may not result in a black screen, wasting a lot of time just to generate a black image making it faster to simply keep letting it error out until it starts to generate
Using --no-half will stop the issue from happening however it will increase ram usage causing the computer to struggle to load or crash the program out right. When not crashing, the generation takes x10 as long, moving from 1-2 minutes with a 512x768 image to 15-20minutes for a single image. (Note that the computer only has 8GB of ram and 4GB of VRAM, pherhaps this solution could work on a stronger computer, although google searches about this problem have shown that even those with very good video cards experienced heavy lag when using no-half)
Enabling Upcast cross attention layer to float 32 will sometimes help, as i will see in the console:
A tensor with all NaNs was produced in VAE. Web UI will now convert VAE into 32-bit float and retry. To disable this behavior, disable the 'Automatically revert VAE to 32-bit floats' setting. To always start with 32-bit VAE, use --no-half-vae commandline flag.
but this also seems to be inconsistent as it wont always fix it right away and it will simply error out.
I have googled endless solutions, attempted reinstalls, removed all extensions all loras, using different launch arguments, disabling all extension, using different models, reverting back to old versions that i know worked fine, reinstalled drivers, updated drivers, performed several health checks on the computer, attempted completly clean installs, etc but so far i have not figured out a way to solve the issue succesfully hence why i would like some help.