Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.83k stars 5.01k forks source link

is crossattn-adm option available? #276

Open emily-swatchon opened 1 year ago

emily-swatchon commented 1 year ago

I am trying to use the crossattn-adm option for fine-tuning the text-to-image model, to both use text (prompt) and class label as conditions.

I set config as below.

model:
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    cond_stage_key: "txt"
    conditioning_key: crossattn-adm

I see that the DiffusionWrapper forward function takes care of the c_crossattn and c_adm input for the diffusion_model, when the conditioning_key == 'crossattn-adm'. https://github.com/Stability-AI/stablediffusion/blob/cf1d67a6fd5ea1aa600c4df58e5b47da45f6bdbf/ldm/models/diffusion/ddpm.py#L1345

However, I cannot seem to find code that actually creates an appropriate dictionary for the cond parameter, which is required for the LatentDiffusion.apply_model function to work properly. https://github.com/Stability-AI/stablediffusion/blob/cf1d67a6fd5ea1aa600c4df58e5b47da45f6bdbf/ldm/models/diffusion/ddpm.py#L886 https://github.com/Stability-AI/stablediffusion/blob/cf1d67a6fd5ea1aa600c4df58e5b47da45f6bdbf/ldm/models/diffusion/ddpm.py#L849

Also, I am not sure if it is the right approach to just set cond_stage_key as "txt", when I want to use two types of conditions here.

Am I missing something? Or is this part for hybrid option (specifically crossattn-adm) not fully implemented?

Thank you!