Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.33k stars 4.95k forks source link

Bad image quality with v-prediction model on 512x512 resolution #296

Open meatybobby opened 1 year ago

meatybobby commented 1 year ago

In the README, it shows v2.0-v getting better clip score than v2.0-base. t2i However, I got very bad image quality when I use v2.0-v on 512x512 resolution. t2i With same config and 768x768 resolution, it works well. Is this an expected result for v-prediction model? Does FID CLIP score in README actually testing with 768x768 for v2.0-v model?