chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.05k stars 59 forks source link

Inpainting only works with 512x512 using stable-fast #121

Open lunatolun opened 4 months ago

lunatolun commented 4 months ago

I have tested in many different ways. The exact same codes and parameters work with AutoPipelineForInpainting (also tried StableDiffusionInpaintPipeline) using EulerAncestralDiscreteScheduler, but it doesn't with stable-fast. However, if I don't touch anything in the codes and provide a 512x512 image, 512x512 mask, stable-fast works as expected (AND FAST!). I have no idea why it happens. Using diffusers==0.26.2

The GPU is RTX4090. I have tried with many dimensions. Behavior: It gets stuck at 0% without any errors 0%| | 0/30 [00:00<?, ?it/s]

The script below either uses stable-fast (it is warmed up, just like examples, using 512x512, and no issues during that phase) or I can disable and test without stable-fast, same parameters (AutoPipelineForInpainting, EulerAncestralDiscreteScheduler) and same images, same codes, works as intended.

        # pipeline here is either warmed up stable-fast thingy, or AutoPipelineForInpainting
    repainted_image = pipeline(
        prompt=pos,
        negative_prompt=neg,
        guidance_scale=cfg,
        width=width,
        height=height,
        image=original,
        mask_image=mask,
        strength=1,
        num_inference_steps=steps,
        num_images_per_prompt=1,
        generator=generator,
    ).images[0]

TL:DR; 512x512 works with both stable-fast and without, anything else only works if I don't use stable-fast.

PS: No issues with StableDiffusionPipeline, just inpainting. PPS: Thanks for the great work.

lunatolun commented 4 months ago

Update: I think loras don't work with StableDiffusionPipeline (with stable-fast) after updating diffusers. I will try downgrading tomorrow since I haven't tested stable-fast inpainting with previous version of diffusers and report back. let me know if there is any other info I can provide.

lunatolun commented 4 months ago

I have downgraded and the issue persists. Also, textual inversion and loras don't work for some reason. After importing the lora, I'm doing this:

                pipeline.unload_lora_weights()
        for lora in loras:
            pipeline.load_lora_weights(f'./lora/{lora}.safetensors', weight_name=f'{lora}.safetensors', adapter_name=lora, local_files_only=True)

        pipeline.set_adapters(loras, adapter_weights=weights)

Text2Image generation is successful, but no loras loaded/used. When I run it without stable-fast, they work just fine.

chengzeyi commented 4 months ago

@lunatolun You could give a full script so that the problem can be reproduced.

yuqilinaa commented 2 days ago

Do you get segment fault when using other resolutions? I encountered the same situation, and later found that it was because there were multiple branches in the up and down sampling, but after stable-fast compile, some branches might not be compiled into the graph. If there are only the above branches, and the feature length and width are no longer divisible by 2, then an error will occur.

code from /diffusers/models/resnet.py:

        if output_size is None:
            hidden_states = F.interpolate(hidden_states, scale_factor=2.0, mode="nearest")
        else:
            hidden_states = F.interpolate(hidden_states, size=output_size, mode="nearest")