Difference between UI prompting & API prompting

Michaelvirga commented 1 year ago

Describe the bug

Hello!

I have been trying to recreate API calls using your script (made some modifications to be able to run it directly from the CLI), and I am noticing key differences between the img2img generations that are made using the API calls versus what is being generated when I use the exact same parameters and seed from the base stable diffusion UI.

The main issue that I see in terms of the difference is that I am not able to use high CFG_SCALE values when calling using the API. Through the UI, I can go up to the mid to high 20s before I start seeing a lot of noise (blue striations and massive unwanted streaks) in the images. However, when I use the API, if I at any time go over a CFG_SCALE of 10, at any img similarity, I begin to get these unwanted streaks.

Do you have any idea why this might be? I made sure all parameters including sample, cfg scale, denoising strength, steps, etc are honored.

Curious to hear your thoughts. I went into my SD code and printed the outputs of the parameters to compare. These are not one to one variables because different workflows are called internally in SD based on whether it's an API call or a UI call, but theoretically these calls from the API should not be as overbaked as they are appearing.

To reproduce

UI CALL:

'prompt': 'beautiful holy city on top of a mountain, sci-fi concept art, hyperdetailed, 8k, trending on Artstation', 'prompt_for_display': None, 'negative_prompt': '', 'styles': ['None', 'None'], 'seed': 1748681168.0, 'subseed': -1, 'subseed_strength': 0, 'seed_resize_from_h': 0, 'seed_resize_from_w': 0, 'sampler_index': 2, 'batch_size': 1, 'n_iter': 1, 'steps': 100, 'cfg_scale': 30, 'width': 576, 'height': 1024, 'restore_faces': False, 'tiling': False, 'do_not_save_samples': False, 'do_not_save_grid': False, 'extra_generation_params': {}, 'overlay_images': None, 'eta': None, 'do_not_reload_embeddings': False, 'paste_to': None, 'color_corrections': None, 'denoising_strength': 0.27, 'sampler_noise_scheduler_override': None, 'ddim_discretize': 'uniform', 's_churn': 0, 's_tmin': 0, 's_tmax': inf, 's_noise': 1.0, 'override_settings': {}, 'scripts': None, 'script_args': None, 'all_prompts': None, 'all_seeds': None, 'all_subseeds': None, 'init_images': [<PIL.Image.Image image mode=RGB size=1080x1920 at 0x197BCFA9960>], 'resize_mode': 0, 'init_latent': None, 'image_mask': None, 'latent_mask': None, 'mask_for_overlay': None, 'mask_blur': 4, 'inpainting_fill': 1, 'inpaint_full_res': False, 'inpaint_full_res_padding': 32, 'inpainting_mask_invert': 0, 'mask': None, 'nmask': None, 'image_conditioning': None

API CALL:

'resize_mode': 0, 'denoising_strength': 0.23, 'mask': None, 'mask_blur': 4, 'inpainting_fill': 0, 'inpaint_full_res': True, 'inpaint_full_res_padding': 0, 'inpainting_mask_invert': 0, 'prompt': 'beautiful holy city on top of a mountain, sci-fi concept art, hyperdetailed, 8k, trending on Artstation', 'styles': None, 'seed': 1748681168, 'subseed': -1, 'subseed_strength': 0, 'seed_resize_from_h': -1, 'seed_resize_from_w': -1, 'batch_size': 1, 'n_iter': 1, 'steps': 50, 'cfg_scale': 30.0, 'width': 576, 'height': 1024, 'restore_faces': False, 'tiling': False, 'negative_prompt': None, 'eta': None, 's_churn': 0.0, 's_tmax': None, 's_tmin': 0.0, 's_noise': 1.0, 'override_settings': None, 'sampler_index': 'LMS', 'include_init_images': False}

Error log

No direct errors are printed

Environment

Blender version (upper right corner of splash screen):
AI Render version (find in Preferences > Add-ons):
Operating system (Windows/Mac/Linux):

Screenshots/video

No response

Additional information

No response

Michaelvirga commented 1 year ago

tmp9gywwiwe

Michaelvirga commented 1 year ago

0026

benrugg commented 1 year ago

Hmm, that's a really interesting question. I don't know what's going on with that, but I'll throw out a couple potential things to check:

Is the model/engine the same? i.e. are you using Stable Diffusion 2.1 in both places?
It's possible that higher steps in the API would fix the issues. I know you said you used the same values, but based on the images it seems like lower steps would potentially cause the issues in the second image.
Do you need to use such a high cfg_scale? I'm not sure I've seen anyone purposely drive it that high (I'm genuinely curious!)

You could try asking this question on the Stability discord in the #api channel. There are devs there that work at Stability and who have been responsive to all sorts of questions.

Michaelvirga commented 1 year ago

After a crazy amount of digging, I just figured it out about 10 minutes ago! It is extremely strange, but in SD code, only for API calls they seem to hard code the "sampler_index" field, which ultimately calls a different "sampler script" from any of the ones we can choose from the UI.

For me, I am experimenting with high cfg_scale because it basically creates a crazy amount of detail while keeping img_similarity high. This effectively makes it so that the composition can remain more consistent in frame to frame animations while the actual details can change significantly!

benrugg commented 1 year ago

Ahh, interesting. That's weird that they're using their own custom sampler script. And that makes sense about the cfg_scale. Thanks for filling me in.

I'm hoping that when depth2img is offered in the API, it'll be game changing for animation consistency!

Michaelvirga commented 1 year ago

Wait, what is depth2image?

On Sun, Jan 8, 2023, 5:02 PM Ben Rugg @.***> wrote:

Ahh, interesting. That's weird that they're using their own custom sampler script. And that makes sense about the cfg_scale. Thanks for filling me in.

I'm hoping that when depth2img is offered in the API, it'll be game changing for animation consistency!

— Reply to this email directly, view it on GitHub https://github.com/benrugg/AI-Render/issues/65#issuecomment-1374937910, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADVMFE7TNXWQUSOB6AHMLALWRM2NTANCNFSM6AAAAAATUJX7O4 . You are receiving this because you authored the thread.Message ID: @.***>

benrugg commented 1 year ago

depth2img is a new feature in Stable Diffusion 2.0 that infers depth information from the initial image, and uses that to guide the diffusion process. I think it might yield much more stable animations. So far it's not available in the DreamStudio API, but I think it's coming very soon.

https://stability.ai/blog/stable-diffusion-v2-release#block-yui_3_17_2_1_1669237481819_8176

Michaelvirga commented 1 year ago

Very interesting!

Do you think that it will be able to create composition aware frame by frame animations? Right now it seems that it is just creating you a depth map while doing the same type of diffusion. Though I haven't compared apples to apples with the same seed, img, and doing img2img instead.

On Mon, Jan 9, 2023 at 10:28 AM Ben Rugg @.***> wrote:

depth2img is a new feature in Stable Diffusion 2.0 that infers depth information from the initial image, and uses that to guide the diffusion process. I think it might yield much more stable animations. So far it's not available in the DreamStudio API, but I think it's coming very soon.

https://stability.ai/blog/stable-diffusion-v2-release#block-yui_3_17_2_1_1669237481819_8176

— Reply to this email directly, view it on GitHub https://github.com/benrugg/AI-Render/issues/65#issuecomment-1375795597, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADVMFE4YD5T4SK3WXENADBLWRQVAZANCNFSM6AAAAAATUJX7O4 . You are receiving this because you authored the thread.Message ID: @.***>

benrugg commented 1 year ago

Interesting. Were you testing it with Automatic1111, or are you just speculating? My understanding is that it infers a depth map in order to create something closer to a composition-aware diffusion. I'm hoping to test it soon!

Michaelvirga commented 1 year ago

I havent actually gotten the latest version of stable diffusion with depth2img yet, so i might be mistaken about the capability. Only tested in the web ui. I dont want to override my local code changes but this sounds really promising and its my exact use case... might just make a second installation to test it. Would be incredible if it could basically keep the composition and commit to an understanding of an "environment" for an entire animation run

On Tue, Jan 10, 2023, 3:59 PM Ben Rugg @.***> wrote:

Interesting. Were you testing it with Automatic1111, or are you just speculating? My understanding is that it infers a depth map in order to create something closer to a composition-aware diffusion. I'm hoping to test it soon!

— Reply to this email directly, view it on GitHub https://github.com/benrugg/AI-Render/issues/65#issuecomment-1377845175, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADVMFEZAD7LKKZ76SFIG4K3WRXEUZANCNFSM6AAAAAATUJX7O4 . You are receiving this because you authored the thread.Message ID: @.***>

benrugg / AI-Render