genforce / ctrl-x

Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)
https://genforce.github.io/ctrl-x
256 stars 9 forks source link

Got a blurred horse #7

Open kono-dada opened 1 month ago

kono-dada commented 1 month ago

result Used the provided script and got a blurred horse. How to solve this?

kuanhenglin commented 1 month ago

That's strange, can you provide the command that you used to run this?

Edit: I just downloaded my own code onto a new machine and executed the script (with the arguments I specified in the README) and everything turned out fine, so I'm wondering how you managed to get this result.

kono-dada commented 1 month ago

Thanks a lot for running the code again for me.

I used the exact provided command in README.md. However, I changed the script to load the model via from_single_file as below

model_id_or_path = "SDXL.safetensors" 
    refiner_id_or_path = "refiner.safetensors"
    device = "cuda" if torch.cuda.is_available() else "cpu"
    variant = "fp16" if device == "cuda" else "fp32"

    # scheduler = DDIMScheduler.from_config('stabilityai/stable-diffusion-xl-base-1.0', subfolder="scheduler")  # TODO: Support schedulers beyond DDIM

    pipe = CtrlXStableDiffusionXLPipeline.from_single_file(
            model_id_or_path, 
            # scheduler=scheduler, 
            torch_dtype=torch_dtype,
            # variant=variant, 
            use_safetensors=True,
        )

I am sure the safetensors I loaded was the original file of stabilityai/stable-diffusion-xl-base-1.0, since I checked the SHA256 of it, which was 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b.

I am not sure if the problem is the change above

kono-dada commented 1 month ago

I found where the problem was.

The scheduler must be DDIM, and cannot be the default EulerDiscreteScheduler. After I used DDIM, the result was correct.

I cannot figure out why even though I have read the paper. I will appreciate it from the bottom of my heart if you can give a brief explanation.

kuanhenglin commented 1 month ago

This is more of a HuggingFace-related problem than other schedules inherently not working for our method.

Specifically, Ctrl-X uses self-recurrence, i.e., at certain timesteps we add noise to $\mathbf{x}_{t - 1}$ to obtain $\mathbf{x}_t$ (without control), to both slightly improve image quality (and reduce artifacting) and I've also found it generally improves appearance transfer. However, to do so, for HuggingFace schedulers we need to manually update the .step attribute. You can find that update here in the code.

The problem is, while testing, I found that a lot of other HuggingFace schedulers either don't use the .step attribute (and uses another method to do internal counting) or overrides the .step attribute during some callback/hook which un-does our changes. Also, the noise-adding process here will also need to be modified to work with the more complex schedulers. I am currently planning to fix this issue (and have Ctrl-X work for more schedulers), but I cannot guarantee it will be soon.

The solution right now is to simply turn off self-recurrence. You can do this in the GUI (app_sdxl.py), check "Use advanced config," and change

self_recurrence_schedule:
    - [0.1, 0.5, 2]  # format: [start, end, num_recurrence]

to

self_recurrence_schedule: []

to disable self-recurrence. Note that this will likely make appearance transfer and artifacting a bit worse. I acknowledge this is rather annoying to do, so in the next few days I will add an easier option in both the GUI and script to disable self-recurrence more easily.

kono-dada commented 3 weeks ago

You've done an amazing work. Thanks for your reply.