PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://pixart-alpha.github.io/PixArt-sigma-project/
GNU Affero General Public License v3.0
1.44k stars 68 forks source link

PixArt-Sigma-XL-2-512-MS do not have multi-aspect/resolution embedding? #70

Closed LinB203 closed 2 months ago

LinB203 commented 2 months ago

Why the weight is named as PixArt-Sigma-XL-2-512-MS? I guess that the MS is multi-scale? Should it have multi-aspect/resolution embedding?

https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-512-MS/discussions/1

missing key: adaln_single.emb.resolution_embedder.linear_2.bias, adaln_single.emb.aspect_ratio_embedder.linear_1.bias, adaln_single.emb.resolution_embedder.linear_1.bias, adaln_single.emb.resolution_embedder.linear_1.weight, adaln_single.emb.aspect_ratio_embedder.linear_2.bias, adaln_single.emb.aspect_ratio_embedder.linear_1.weight, adaln_single.emb.resolution_embedder.linear_2.weight, adaln_single.emb.aspect_ratio_embedder.linear_2.weight.

LinB203 commented 2 months ago

I checked the PixArt-alpha/PixArt-Sigma-XL-2-1024-MS. It also do not has these keys. But the PixArt-alpha/PixArt-XL-2-1024-MS has these keys.

LinB203 commented 2 months ago

I want to inference some not square resolutions such as 512×1024, not 512×512 or 1024×1024.

LinB203 commented 2 months ago

There is no aspect_ratio_embedder or resolution_embedder in all pixart-sigma weights, how can it work? Could you help me? I am confused about it. @lawrence-cj

lawrence-cj commented 2 months ago

Show me your whole command

LinB203 commented 2 months ago
import torch
from diffusers import Transformer2DModel, PixArtSigmaPipeline
from torch import nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weight_dtype = torch.float16
transformer = Transformer2DModel.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-512-MS",
    subfolder='transformer', cache_dir='./cache_dir',
    torch_dtype=weight_dtype,
    use_safetensors=True,
)
ValueError: Cannot load <class 'diffusers.models.transformers.transformer_2d.Transformer2DModel'> from PixArt-alpha/PixArt-Sigma-XL-2-512-MS because the following keys are missing: 
 adaln_single.emb.aspect_ratio_embedder.linear_1.bias, adaln_single.emb.resolution_embedder.linear_1.weight, adaln_single.emb.aspect_ratio_embedder.linear_2.bias, adaln_single.emb.resolution_embedder.linear_2.bias, adaln_single.emb.resolution_embedder.linear_1.bias, adaln_single.emb.resolution_embedder.linear_2.weight, adaln_single.emb.aspect_ratio_embedder.linear_2.weight, adaln_single.emb.aspect_ratio_embedder.linear_1.weight. 
 Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None` if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
{
  "_class_name": "Transformer2DModel",
  "_diffusers_version": "0.28.0.dev0",
  "activation_fn": "gelu-approximate",
  "attention_bias": true,
  "attention_head_dim": 72,
  "attention_type": "default",
  "caption_channels": 4096,
  "cross_attention_dim": 1152,
  "double_self_attention": false,
  "dropout": 0.0,
  "in_channels": 4,
  "interpolation_scale": 1,
  "norm_elementwise_affine": false,
  "norm_eps": 1e-06,
  "norm_num_groups": 32,
  "norm_type": "ada_norm_single",
  "num_attention_heads": 16,
  "num_embeds_ada_norm": 1000,
  "num_layers": 28,
  "num_vector_embeds": null,
  "only_cross_attention": false,
  "out_channels": 8,
  "patch_size": 2,
  "sample_size": 64,
  "upcast_attention": false,
  "use_linear_projection": false,
  "use_additional_conditions": true  # add this line
}
LinB203 commented 2 months ago

Thank you for replying. I know "use_additional_conditions": true isn't in the original config.json, but if it does not have use_additional_conditions, how does the model inference any resolution?

LinB203 commented 2 months ago

These model weights (PixArt-alpha/PixArt-Sigma-XL-2-512-MS and PixArt-alpha/PixArt-Sigma-XL-2-1024-MS) do not have resolution_embedder or aspect_ratio_embedder. Am I missing something? Do you have a tutorial that supports any aspect/resolution generation?@lawrence-cj

lawrence-cj commented 2 months ago

Change this line: "use_additional_conditions": false. Generating multi-scale image according to noise shape.

LinB203 commented 2 months ago

Change this line: "use_additional_conditions": false. Generating multi-scale image according to noise shape.

But resolution_embedder or aspect_ratio_embedder are added during training, and they don't need to be added during inference?

LinB203 commented 2 months ago

https://github.com/PixArt-alpha/PixArt-sigma/blob/master/diffusion/model/nets/PixArtMS.py#L188-L191 Are these lines not switched on during training?

lawrence-cj commented 2 months ago

https://github.com/PixArt-alpha/PixArt-sigma/blob/6ec1500b079a85e291625e2f5a0c935fd9913f12/diffusion/model/nets/PixArtMS.py#L187 Check it first.

LinB203 commented 2 months ago

https://github.com/PixArt-alpha/PixArt-sigma/blob/6ec1500b079a85e291625e2f5a0c935fd9913f12/diffusion/model/nets/PixArtMS.py#L187

Check it first.

But the config file is turned on. https://github.com/PixArt-alpha/PixArt-sigma/blob/6ec1500b079a85e291625e2f5a0c935fd9913f12/configs/pixart_alpha_config/PixArt_xl2_img1024_internalms.py#L32

lawrence-cj commented 2 months ago

This is PixArt-Alpha... not PixArt-Sigma

LinB203 commented 2 months ago

Sorry, I apologise for my stupidity.