Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.83k stars 5.01k forks source link

[stable-diffusion-x4-upscaler] Use pretrain VAE to encode a 512x512 image to latent space get nan, the image has been normalized to [-1,1] #184

Open leeruibin opened 1 year ago

leeruibin commented 1 year ago

I have downloaded the stable-diffusion-x4-upscaler pre-train model from https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler

I try to fine-tune the upscaler model with my own data, however, I find when I encode the 512x512 image to latent space 128x128 with the pretrain VAE parameter, I get nan with size [b,4,128,128].

I have tracked the VAE forward function. I find that following the calculation map, the data will soon become huge and data overflow will happen.

image

I use the stable diffusion fine-tuning script in the following link and modify the script with my own dataset since there is no finetuning script for this x4-upscaler model. https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py

Is there any solution for this error?

sczhou commented 1 year ago

Hi @leeruibin , same here, did you solve this problem?

vipzhe commented 1 year ago

@leeruibin , I met same problem, did you know how to solve this problem?

vipzhe commented 1 year ago

No half precision works

Harper714 commented 1 year ago

Hi, any one fine-tuned the upscale model successfully?

oubotong commented 1 year ago

Same issue. Looking for some guidance on finetuning 4x upscale model