CFGpp-diffusion / CFGpp

Official repository for "CFG++: manifold-constrained classifier free guidance for diffusion models"
99 stars 2 forks source link

I get nonsensical results #6

Open andreaferretti opened 3 days ago

andreaferretti commented 3 days ago

I just tried your example of editing on the README, literally

python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_inversion_cfg++" --cfg_guidance 0.6

but I get nonsensical results. Even after using a larger amount of steps (--NFE 50) this is what I get as output reconstruct

So I tried switching to SDXL with

python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_edit_cfg++" --cfg_guidance 0.6 --NFE 50 --model sdxl

but I get IndexError: list index out of range in the textual embedding part, more precisely

Traceback (most recent call last):
  File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/examples/inversion.py", line 72, in <module>
    main()
  File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/examples/inversion.py", line 53, in main
    result = solver.sample(prompt1=[args.null_prompt, args.prompt],
  File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/latent_sdxl.py", line 391, in sample
    pool_tgt_prompt_embed) = self.get_text_embed(prompt1[0], prompt1[2], prompt2[0], prompt2[2], clip_skip)
IndexError: list index out of range

Am I doing something wrong?

andreaferretti commented 3 days ago

This was just an error in the examples/inversion.py script, the prompt has to be passed twice (I guess for the two text encoders of SDXL). Yet, this is what I get, at 50 NFE, with

python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_edit_cfg++" --cfg_guidance 0.6 --model sdxl --NFE 50

reconstruct

It seems pretty far from the intended result :-/