PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

https://pixart-alpha.github.io/PixArt-sigma-project/

GNU Affero General Public License v3.0

1.63k stars 77 forks source link

How to use the DMD checkpoint? #49

Closed EternalEvan closed 5 months ago

EternalEvan commented 5 months ago

Dear author: The DMD model gives impressive results in your paper. I'm confused about how to use the checkpoint of the DMD model. In train_scripts/train_pixart_dmd.py, you load them by: https://github.com/PixArt-alpha/PixArt-sigma/blob/b20e77ec45f544e860195b47ed6f3ad3ab38a8c6/train_scripts/train_pixart_dmd.py#L205-L207

And in the config file, you seem to choose another model_id: https://github.com/PixArt-alpha/PixArt-sigma/blob/b20e77ec45f544e860195b47ed6f3ad3ab38a8c6/configs/pixart_app_config/PixArt-DMD_xl2_img512_internalms.py#L16 which is not the same diffusers model you have provided. So I wonder what is the right way to load your DMD model?

lawrence-cj commented 5 months ago

Oh, it's a little bug here. Replace

 load_from = "output/pretrained_models/pixart_alpha_512px_284000_diffusers"

with

 load_from = "PixArt-alpha/PixArt-XL-2-512x512"

Thanks for noticing.

EternalEvan commented 5 months ago

Thanks for your quick answer. Another question is how to load the pre-trained DMD model you have released. You did not mention how to download the DMD model in asset/docs. Should I directly change the config.load_from to PixArt-alpha/PixArt-Alpha-DMD-XL-2-512x512 and it will automatically download from huggingface?

ApolloRay commented 5 months ago

` mport torch from diffusers import ConsistencyDecoderVAE, PixArtAlphaPipeline, Transformer2DModel, DDPMScheduler import json

weight_dtype = torch.float16 T5_token_max_length = 120 model_path = "{DMD_PATH}" pipeline_load_from = "{ORIGINAL_ALPHA_PATH}"

pipe = PixArtAlphaPipeline.from_pretrained( pipeline_load_from, transformer=None, torch_dtype=weight_dtype, )

pipe.transformer = Transformer2DModel.from_pretrained(model_path, subfolder="transformer", torch_dtype=weight_dtype) pipe.scheduler = DDPMScheduler.from_pretrained(model_path, subfolder="scheduler")

pipe.to("cuda")

speed-up T5

pipe.text_encoder.to_bettertransformer()

image_save_path = "" prompt = "" images = pipe( prompt=prompt, timesteps=[400], width=512, height=512, guidance_scale=1, num_inference_steps=1, num_images_per_prompt=4, use_resolution_binning=True, output_type="pil", max_sequence_length=T5_token_max_length, ).images[0] images.save(img_save_path) ` It works.

lawrence-cj commented 5 months ago

Thanks for your quick answer. Another question is how to load the pre-trained DMD model you have released. You did not mention how to download the DMD model in asset/docs. Should I directly change the config.load_from to PixArt-alpha/PixArt-Alpha-DMD-XL-2-512x512 and it will automatically download from huggingface?

For training from base model you should use the PixArt-alpha/PixArt-XL-2-512x512 I mentioned above. For finetuning, you can directly use the PixArt-alpha/PixArt-Alpha-DMD-XL-2-512x512. Both of them will be automatically downloaded.

ApolloRay commented 5 months ago

How many train data used to train a wonderful dmd model ?