What are the extra 4 channels from the denoiser?

PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

https://pixart-alpha.github.io/PixArt-sigma-project/

GNU Affero General Public License v3.0

1.44k stars 67 forks source link

What are the extra 4 channels from the denoiser? #81

Closed samedii closed 1 month ago

samedii commented 2 months ago

Been trying to figure it out from this https://github.com/PixArt-alpha/PixArt-sigma/blob/master/train_scripts/train.py#L122

Learned variance maybe?

It looks like they are not used in your finetuning script but I might be wrong.

lawrence-cj commented 2 months ago

Which line? Your link seems not link to the right place?

samedii commented 2 months ago

Sorry I mean the 4 extra channels that the transformer outputs (it outputs 8). Only the first 4 channels are actually used in both finetuning and inference as far I can tell.

samedii commented 2 months ago

If you want LOC then

Is it a leftover from experimenting with other ModelVarType previously and the final model was not trained with this? https://github.com/PixArt-alpha/PixArt-sigma/blob/master/diffusion/model/gaussian_diffusion.py#L798

lawrence-cj commented 1 month ago

Oh yes, this part we just align with the original DiT implementation without changing.

samedii commented 1 month ago

Thanks!