chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.16k stars 70 forks source link

compile_unet error for `StableDiffusionLatentUpscalePipeline` #106

Closed kunc01 closed 8 months ago

kunc01 commented 8 months ago

Hi! Thanks for your amazing work.

I've tested on a V100 GPU, stable-fast works perfect for StableDiffusionControlNetInpaintPipeline, but for latent-upscaler and corresponding StableDiffusionLatentUpscalePipeline, sfast.compilers.diffusion_pipeline_compiler.compile_unet leads to

err

, while vae and text_encoder still work well.

I've tried options like enable_xformers, enable_triton, enable_cuda_graph, prefer_lowp_gemm and enable_fused_linear_geglu, but the error persists.

Looking forward to your reply, thank you!

chengzeyi commented 8 months ago

@kunc01 This problem has been fixed in the latest nightly release. Please try it.

kunc01 commented 8 months ago

@chengzeyi Thanks for your reply,I tried 1.0.2.dev20240127+torch211cu118 on a V100 and 1.0.2.dev20240127+torch211cu121 on a A10,but found new errors. error

chengzeyi commented 8 months ago

@chengzeyi Thanks for your reply,I tried 1.0.2.dev20240127+torch211cu118 on a V100 and 1.0.2.dev20240127+torch211cu121 on a A10,but found new errors. error

Just fixed it. Please retry.

kunc01 commented 8 months ago

@chengzeyi error

chengzeyi commented 8 months ago

@chengzeyi error

I guess this is related to Triton. can you try running with disabling Triton? Or you can share a minimal reproducing script so that I can reproduce it quickly.

kunc01 commented 8 months ago

@chengzeyi Inference succeeded with enable_triton = False

upsampler = StableDiffusionLatentUpscalePipeline.from_pretrained("stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16).to("cuda") config = CompilationConfig.Default() config.enable_xformers = True config.enable_triton = False config.enable_cuda_graph = True upsampler = compile(upsampler, config)