PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://pixart-alpha.github.io/PixArt-sigma-project/
GNU Affero General Public License v3.0
1.44k stars 67 forks source link

*** TypeError: PixArtAlphaCombinedTimestepSizeEmbeddings object argument after ** must be a mapping, not NoneType #126

Open nighting0le01 opened 3 days ago

nighting0le01 commented 3 days ago

interpolation_scale = {256: 0.5, 512: 1, 1024: 2, 2048: 4}
transformer = Transformer2DModel(
    sample_size=1024 // 8,
    num_layers=28,
    attention_head_dim=72,
    in_channels=4,
    out_channels=8,
    patch_size=2,
    attention_bias=True,
    num_attention_heads=16,
    cross_attention_dim=1152,
    activation_fn="gelu-approximate",
    num_embeds_ada_norm=1000,
    norm_type="ada_norm_single",
    norm_elementwise_affine=False,
    norm_eps=1e-6,
    caption_channels=4096,
    interpolation_scale=interpolation_scale[1024],
    use_additional_conditions=False,
)
    sample_inputs_pytorch = [torch.randn(bs, in_channels, latent_height, latent_width, dtype=torch.float32).to(device),
                                torch.randn(bs, 300, 4096, dtype=torch.float32).to(device),
                                torch.randint(0, 1000, (bs,), dtype=torch.float32).to(device),
                                ]
 out_pytorch_exported = transformer(hidden_states= sample_inputs_pytorch[0],
                                            encoder_hidden_states = sample_inputs_pytorch[1],
                                            timestep =sample_inputs_pytorch[2])

*** TypeError: PixArtAlphaCombinedTimestepSizeEmbeddings object argument after ** must be a mapping, not NoneType

what resolution and aspect ratio to pass in the added_cond_kwargs for transformer input of 1x4x128x128