what changes would we need to do if we used our own dataset?

awais00012 commented 8 months ago

Thanks for the awesome explanation. Could you tell me which changes we need before training the model on our data?

thatdev6 commented 5 months ago

For images of different sizes, you would continue to do what you were already doing in your dataset class. That is centre crop and resize to 64x64. Then train your model on these 64x64 images and generate an image with xt as 64x64. To avoid any confusion you can also change the im_size here to 64(instead of 28).

So basically the only change from the repo code that you should need to do is that centre cropping. You dont need to actually train a model to test this, just run the sampling script on a randomly initialized model and see if it still throws an error. If it does just share it here and I will take a look.

Okay i will get back to you with that, also should i change the im channels to 3 for rgb pictures?

thatdev6 commented 5 months ago

For images of different sizes, you would continue to do what you were already doing in your dataset class. That is centre crop and resize to 64x64. Then train your model on these 64x64 images and generate an image with xt as 64x64. To avoid any confusion you can also change the im_size here to 64(instead of 28).

So basically the only change from the repo code that you should need to do is that centre cropping. You dont need to actually train a model to test this, just run the sampling script on a randomly initialized model and see if it still throws an error. If it does just share it here and I will take a look.

Okay so I was able to get outputs now without changing anything except changing im size and im channels in config and data loader function.

But do you know the reason i am getting such noisy outputs? . Currently batch size is 4 and epochs is 50

explainingai-code commented 5 months ago

This is actually fine. When I was also training on an rgb dataset , I first used to get these outputs, so more training should lead to actual outputs(close to dataset images) ultimately. In these generation results, when you see x0_400/500 were they closer to your actual dataset images? Also did you end up training for more epochs? Did the outputs improve?

thatdev6 commented 5 months ago

This is actually fine. When I was also training on an rgb dataset , I first used to get these outputs, so more training should lead to actual outputs(close to dataset images) ultimately. In these generation results, when you see x0_400/500 were they closer to your actual dataset images? Also did you end up training for more epochs? Did the outputs improve?

When you say continous training, how would I achieve that, doesn't the model overwrite itself, I am actually using the free resources of Google CoLab for training so how could I save my progress and train multiple times for better results.

explainingai-code commented 5 months ago

I mean train it for more epochs like 200 epochs.

thatdev6 commented 5 months ago

I mean train it for more epochs like 200 epochs.

Okay so multiple epochs, also do you think the reduced batch size is effecting results?

explainingai-code commented 5 months ago

Thats something you would have to experiment with and tune but I am assuming you are anyway limited by compute and 4 is the max you can go ?

thatdev6 commented 5 months ago

Yes 4 is the highest I can go, I will obviously need help in tuning the model parameters for better results so I will keep you updated.

Thank you so much for the help in the mean time.

thatdev6 commented 5 months ago

Yes 4 is the highest I can go, I will obviously need help in tuning the model parameters for better results so I will keep you updated.

Thank you so much for the help in the mean time.

Is there any way i can save my progress while training, what i want to do is say train up to 130 epochs stop my training and then continue training from 130 epochs again?

thatdev6 commented 5 months ago

this is my result after 100 epochs on batch 6

Why are the generated outputs getting so dark, shouldnt they start to mimic the actual images at some point

This is actually fine. When I was also training on an rgb dataset , I first used to get these outputs, so more training should lead to actual outputs(close to dataset images) ultimately. In these generation results, when you see x0_400/500 were they closer to your actual dataset images? Also did you end up training for more epochs? Did the outputs improve?

thatdev6 commented 5 months ago

This is at x0_500

And this is at x0_999

thatdev6 commented 5 months ago

for reference these are some dataset images

thatdev6 commented 5 months ago

I would also like to mention that as of now my training time per epoch is 40s so to train 100 epochs it took me a little over an hour, I remember in your yt video you trained your model for around 3 hours on 60 epochs, so my dataset might also be a limiting factor over here(257 images after split) Let me know what think of this conclusion, also how would you update the model params to get better results

explainingai-code commented 5 months ago

->Is there any way i can save my progress while training, what i want to do is say train up to 130 epochs stop my training and then continue training from 130 epochs again? For resuming training, this should be already happening when the code is loading the checkpoint here So all you would need to do is after 130 epochs, download the checkpoint and before running the next 130 epochs, simply place this downloaded checkpoint in the right path. ->so my dataset might also be a limiting factor over here(257 images after split). Yes 250 images are very less, the number that I gave was for training mnist, 60000 images of 28x28 on NvidiaV100, where I trained for ~50 epochs with batch size 64, so which means about 50000 steps. With 250 images and batch size 6, effectively that means 40 steps in one epoch so even after 100 epochs thats actually just 4000 steps. I would suggest doing the following things:

If you can get more data then that would definitely help.
Continue training for longer if by analyzing the x0-x999 images you are continuing to see improvement in models generation capabilities.
Use augmentations, flipping , and even cropping, based on your images, I am assuming your goal is to generate images with those cracks, so you could take multiple crops from the same images like below

thatdev6 commented 5 months ago

So

->Is there any way i can save my progress while training, what i want to do is say train up to 130 epochs stop my training and then continue training from 130 epochs again? For resuming training, this should be already happening when the code is loading the checkpoint here So all you would need to do is after 130 epochs, download the checkpoint and before running the next 130 epochs, simply place this downloaded checkpoint in the right path.

Are the checkpoints saved in the .pth file? So I would have to transfer that file before starting training then

explainingai-code commented 5 months ago

Yes download the .pth file after one iteration of training, and put it back in the necessary path before starting the second iteration.

thatdev6 commented 5 months ago

Okay, I will work on the suggestions, in your opinion what is the better alternative Opting for a larger dataset with images from different sources and sizes Or a the same images cropped to form a larger dataset

explainingai-code commented 5 months ago

I would say a larger dataset is beneficial and then using cropping you can further increase the images that diffusion model gets to see during training. But obv if that cannot be done due to some constraints then I would still suggest to try out only the cropping solution.

thatdev6 commented 5 months ago

these are my images after crop and resize in the data loader function are these okay?, i think they have lost a lot of quality

Also i do i have a larger dataset around 45k images but the images in it are to inconsistent in every manner size, quality so what effect will my my current crop in data loader function will have on images which are not 3264 x 2448 and Size: 2448 x 3264

explainingai-code commented 5 months ago

For the 45K images, my guess is that the centre crop and resize to 64x64 should handle the inconsistencies in size(I dont know how inconsistent they are in quality). But if you have 45K iamges, I would say why not try with that just to see how good of an output quality you get from ddpm.

thatdev6 commented 5 months ago

Yes that is the goal right, But training those images will take a considerate amount of time and if the output is still noisy, all the time spent will have been wasted

Also are the black outputs normal, currently I am saving my progress and increasing epochs on the 257 images, but on 130 epochs the final outputs were mostly black so don't they indicate the final noise less images will be black or is this part of the process?

For the 45K images, my guess is that the centre crop and resize to 64x64 should handle the inconsistencies in size(I dont know how inconsistent they are in quality). But if you have 45K iamges, I would say why not try with that just to see how good of an output quality you get from ddpm.

explainingai-code commented 5 months ago

Yes diffusion models require decent amount of data to train so unless you throw in more compute power, I dont see any other way to reduce the time.
The single color outputs are normal to occur during the training process(as I mentioned it happened during my training as well). And I would suggest to not compare based on epochs, 100 epochs on a dataset of 200 is not the same as 100 epochs on a dataset of 50000 images. Rather use the number of steps/iterations that has happened in your training, in your case its just 5K steps(compared to the 80K steps that I used in the video). I dont think you need as many steps as that cause you have lesser variation in your dataset but still just wanted to give a perspective.

thatdev6 commented 5 months ago

Yes diffusion models require decent amount of data to train so unless you throw in more compute power, I dont see any other way to reduce the time. The single color outputs are normal to occur during the training process(as I mentioned it happened during my training as well). And I would suggest to not compare based on epochs, 100 epochs on a dataset of 200 is not the same as 100 epochs on a dataset of 50000 images. Rather use the number of steps/iterations that has happened in your training, in your case its just 5K steps(compared to the 80K steps that I used in the video). I dont think you need as many steps as that cause you have lesser variation in your dataset but still just wanted to give a perspective.

I am finally starting to see results on the 257 image dataset after 400 epochs of training. I am assuming there are always going to be noisy images in the generated batch even if i train on the 45k images for around 500 epochs.

Also is there a way I can rescale the generated images into a higher resolution after generation.

explainingai-code commented 5 months ago

Great. I dont think that assumption is correct, once your model has converged(looks like that point maybe somewhere around 1000 epochs) it will not have these noisy images at all. And with 45K images around after 200 epochs you will most likely be able to see decent outputs(obv you will have to experiment to assert that).

For higher resolution, you can try with pre-trained super resolution models(just searching on the web should give you some models and you can test them to see how good of a result you are getting with that. Or you can go the stable diffusion route by first training autoencoder to convert from 256x256 to 32x32, training diffusion models on 32x32 images, generating latent 32x32 images at sampling and then decoding them to 256x256 using autoencoder. But that would require training multiple(diffusion and autoencoder) models.

thatdev6 commented 5 months ago

Hello its me again I have a question, can this model generate images based on prompts, for example say I prompt to generate an image with snow and cracks, is this achievable?

explainingai-code commented 5 months ago

Hello :) , Yes the diffusion model can be conditioned on class(using class embeddings) or text(using cross attention). But this repo does not have that , this one only allows you to generate images unconditionally. I do have it present in the stable diffusion repo(https://github.com/explainingai-code/StableDiffusion-PyTorch), you can either use that or you can look at how class conditioning/text conditioning is achieved there and replicate the same with ddpm.

dangdinh17 commented 1 month ago

hello, i have a question with my model i want to train on my data with the config params like that and my image is in jpg format. I have tried another methods like Dataparallel but it doesn't work. so please help me with this

explainingai-code commented 1 month ago

@dangdinh17 Can you tell me what error you are facing ? Out of memory ?

dangdinh17 commented 1 month ago

yes, my error is that

explainingai-code commented 1 month ago

Yeah, can you try with 64x64 . In config set im_size as 64 and in dataset class's getitem method, resize the image to 64x64. Can you see if that gets rid of the error ?

dangdinh17 commented 1 month ago

i have tried with 64x64 and it worked. but i want to train with the shape 128x128 because of my study. let me introduce about my study and can you give me some essential suggests, please? i have a dataset with the image in 150x150 image, and i have a blured images with the motion blur and i want to use DDPM model to restore the image the the highest quality. so i want to train my model with the shape 128x128 or 256x256 to suit my data shape. so i have some questions that:

if i devided my image (for example 128x128) into 64x64 parts and train with this 64x64 part, so when i used this trained model with the shape 128x128, does it work good?
i've tried many model like GANs but the results didn't good, so now i try with the DDPM and may be the next stable diffusion, can you give me some must try models for my study? thank you

explainingai-code commented 1 month ago

If your images are from a dataset(or match a dataset) that has a super resolution model available, then you can train DDPM on 64x64 images and then use that super resolution model checkpoint to get 128x128 image. Or you can train a super resolution model yourself. The second option is like you said to try with LDM, so then your trained auto-encoder will take care of converting 256x256 images to 64x64 latents(as well as converting 64x64 generated latents to 256x256). And diffusion model will need to be trained on the smaller, 64x64 latent images.

dangdinh17 commented 1 month ago

oh yes, i see, thank you so much

dangdinh17 commented 1 month ago

i have another question that, if my data is about motion blur or exposure blur, so if i only use the original code for this blur, does it work? or i must train the model with adding more noise like motion blur and light blur rather than only gaussian noise?

explainingai-code commented 1 month ago

@dangdinh17 , apologies for the late reply. But I have responded to your issue on Stable Diffusion repo. Do take a look at the repo mentioned in that reply - https://github.com/explainingai-code/StableDiffusion-PyTorch/issues/21#issuecomment-2248046225 . I think that implementation does exactly what you need.

explainingai-code / DDPM-Pytorch

what changes would we need to do if we used our own dataset? #1