PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://pixart-alpha.github.io/PixArt-sigma-project/
GNU Affero General Public License v3.0
1.44k stars 67 forks source link

what's different between pixart-alpha and pixart-sigma arch? #123

Open foreverpiano opened 6 days ago

foreverpiano commented 6 days ago

Is there any guidance on transfering code previously supporting pixart-alpha to support pixart-sigma? @lawrence-cj

lawrence-cj commented 6 days ago

Which one?

foreverpiano commented 6 days ago

Which one?

从alpha迁移到sigma是不是架构不需要修改的,我看在diffuser库里面,两个代码基本一样,除了sigma少了add_cond_kwargs这个变量(在6.1节)

另外我还想问一下我们可以生成更高分辨率(4k以及大于4K)的图像的吗,如果可以,要在这个script基础上怎么修改呢

from diffusers import Transformer2DModel
from diffusers_patches import pixart_sigma_init_patched_inputs, PixArtSigmaPipeline
import torch

assert getattr(Transformer2DModel, '_init_patched_inputs', False), "Need to Upgrade diffusers: pip install git+https://github.com/huggingface/diffusers"
setattr(Transformer2DModel, '_init_patched_inputs', pixart_sigma_init_patched_inputs)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weight_dtype = torch.float16

transformer = Transformer2DModel.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", 
    subfolder='transformer', 
    torch_dtype=weight_dtype,
    use_safetensors=True,
)
pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
    transformer=transformer,
    torch_dtype=weight_dtype,
    use_safetensors=True,
)
pipe.to(device)

# Enable memory optimizations.
# pipe.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("./catcus.png")