dome272 / Diffusion-Models-pytorch

Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)
Apache License 2.0
1.11k stars 256 forks source link

Training Time #2

Open chang0517 opened 1 year ago

chang0517 commented 1 year ago

Hi. I'm the newbie studying about Diffusion models. First, thank you for interesting video and code.

I have some question about the code and implementation.

  1. Is conditional image generation possible for 32x32 CIFAR-10 Images? Then, is it possible to adjust the Unet structure and various parameters? As a result of training and inference, the image is not well generated. If there is a reason, I would like to know what it is.

  2. Can you tell me how long the train time is when you train on 64x64?

Thank you.

dome272 commented 1 year ago

Hey there, thanks for the nice comment.

  1. In order to help you better I would need more information on your training run. What exactly did you change in my code? Did you just change the path to the CIFAR-10 dataset at 32x32? If so you also need to change a few things in the code. I assume you changed the things already to make the training code run: Changing dataloader resizing, train image size, modify the inputs to the self-attention blocks. Can you confirm that you changed all these things? If so, I would need more information on how your sampled images looked like, for how long you trained and how the MSE metric looks like (its being logged to tensorboard so you can just open it like this).

  2. I trained for 300 epochs on the 64x64 datasets (CIFAR-10 and Landscape), which took around 3 days on an rtx 3090.

karkay22 commented 1 year ago

Hi, thank for your video which was really interesting and well explained.

I'm sorry, I'm new to diffusion models and I'm trying to train your model with CFG on an MNIST dataset like this one : https://www.kaggle.com/datasets/scolianni/mnistasjpg or on CIFAR10-32

I encounter some troubles, I changed the dataloader resizing to 40, the image size to 32 but got an error : shape '[-1, 128, 32, 32]' is invalid for input of size 458752.

Also, I m'not sure of what I have to change in the attention-blocks, tried multiple things for the inputs of each layer but can't find the good structure, could you tell me what I have to change and the values I should put instead ?

Hope you'll be able to help me.

Thank you