VinAIResearch / LFM

Official PyTorch implementation of the paper: Flow Matching in Latent Space
https://vinairesearch.github.io/LFM/
GNU Affero General Public License v3.0
174 stars 5 forks source link

Questions about normalizing images in training. #8

Closed zjlww closed 7 months ago

zjlww commented 7 months ago

I realized that you normalize the images during training. Is this common in training and evaluating image generative models? I don't think other models (LDM, DDPM, etc.) do this during training. Is the FID comparison in your paper still fair with this normalization in place?

train_transform = transforms.Compose(
    [
        transforms.Resize(args.image_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ]
)
hao-pt commented 7 months ago

Hi,

During the training stage, we normalized input images to be in the range [-1, 1]. This is the standard practice in image generation. If you double check the code of LDM, they adopt taming.data.imagenet.ImagePaths (refer to this link: https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/data/base.py#L51) to load input images and normalize them to [-1, 1].

However, in the sampling stage, we rescale the output images to be in the range [0, 1] before saving. You can check it out here.

Therefore, it still affirms the fairness of FID comparison.

Thanks.

zjlww commented 7 months ago

Thank you so much with the prompt and detailed reply!