Open tin-sely opened 5 months ago
it looks like it's not meant for progressive scaling? i guess the best option would be to train a lower resolution and then copy the relevant weights to a higher-res network
another thing i was curious about was the inputs:
def forward(self, x, sigma, aug_cond=None, class_cond=None, mapping_cond=None):
x, sigma, and class_cond are clear, but do you have any more details on aug_cond and mapping_cond?
@tin-sely I believe aug_cond
is for non-leaky augmentations. When an input image is augmented during training, a description of how that image was augmented is also given to the generator (as aug_cond
- augmentation conditioning), so that the generator eventually learns how to generate either augmented or non-augmented images depending the value of the aug_cond
input.
I believe mapping_cond
is an older name for aug_cond
which is used in the non-transformer model configs (the ones that use KarrasAugmentWrapper
- which takes the aug_cond
tensor and gives it to the model as mapping_cond
)
thanks a bunch @madebyollin! ✨
My understanding is that you use aug_cond
when you wish to provide the model with information about the augmentations using Fourier Features:
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L657
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L658
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L718
On the other hand, if you use mapping_cond
, the condition will be fed directly into a linear layer, as shown here:
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L660
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L720
These embeddings are then both fed into the MappingNetwork: https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L721
But getting more clarity on this would definitely help!
hey,
loved your paper and thanks a bunch for providing the code!
i have a quick question, how do you scale and train the network (HDiT) for increased resolutions? i saw you mentioned here: https://github.com/crowsonkb/k-diffusion/issues/14#issuecomment-1199475244 that you first need to build the entire network, and then skip layers but i'm not sure if this also applies to this new architecture?
many thanks!