I am trying to run this on Stable Diffusion 2.1 but I keep getting black images

cloneofsimo / paint-with-words-sd

Implementation of Paint-with-words with Stable Diffusion : method from eDiff-I that let you generate image from text-labeled segmentation map.

MIT License

634 stars 52 forks source link

Open nirbenda opened 1 year ago

nirbenda commented 1 year ago

Trying to run the code on Stable Diffusion 2.1 returns black images (filled with nan values)

After investigating the noise_pred_text gets all values -Inf on SD2.1, whereas in 1.4 they get valid values.

Any idea on what had changed between the two that might have caused this?

(Running on linux, am able to work out SD2.1 image generation and dream booth training)

nirbenda commented 1 year ago

UPDATE:

I was able to make the example code run by removing the @torch.autocast("cuda") and changing the torch.float16 to torch.float32.

That said the resulted image was not the expected result but rather a noise background. output_cat_dog

MentalGear commented 1 year ago

thanks for the update @nirbenda . Please let me know about any more progress :)

MingzhaoYang commented 1 year ago

@nirbenda Resize the input color map to 768*768 and add prediction_type="v_prediction" in the LMSDiscreteScheduler used in pipeline may help.