Closed samedii closed 1 month ago
Which line? Your link seems not link to the right place?
Sorry I mean the 4 extra channels that the transformer outputs (it outputs 8). Only the first 4 channels are actually used in both finetuning and inference as far I can tell.
If you want LOC then
Is it a leftover from experimenting with other ModelVarType
previously and the final model was not trained with this?
https://github.com/PixArt-alpha/PixArt-sigma/blob/master/diffusion/model/gaussian_diffusion.py#L798
Oh yes, this part we just align with the original DiT implementation without changing.
Thanks!
Been trying to figure it out from this https://github.com/PixArt-alpha/PixArt-sigma/blob/master/train_scripts/train.py#L122
Learned variance maybe?
It looks like they are not used in your finetuning script but I might be wrong.