facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Other
6.37k stars 570 forks source link

Low Loss, Bad Results #44

Open HassanJbara opened 1 year ago

HassanJbara commented 1 year ago

Greetings. I'll preface my question with a disclaimer that I don't have much experience in ML and I'm still exploring myself, so I apologize if this question may sound silly or too general.

I'm using this architecture and library to train a model of my own on a certain type of latents. If I set the training goal to predict the noise at each step my model successfully reaches low loss values (~0.15). Yet the samples it produces are nothing like the original. Only setting the goal to predict the original x latent works. I don't understand why that is, maybe you can at least give me an idea to a potential cause or give me an intuition for the problem?

Any help would be very appreciated, thank you.

smandava98 commented 8 months ago

Hi @HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise?

HassanJbara commented 8 months ago

Hi @HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise?

Default is predicting noise, but there's an option to predict original latent in the code somewhere, although I don't remember where exactly at this point.

smandava98 commented 8 months ago

Ah, I see the config now. Thanks. Did you ever figure out why you were getting different results for predicting noise vs predicting latents? What were the reasons for your issue back then?

On Thu, Mar 21, 2024 at 2:55 AM Hassan @.***> wrote:

Hi @HassanJbara https://github.com/HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise?

Default is predicting noise, but there's an option to predict original latent in the code somewhere, although I don't remember where exactly at this point.

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/DiT/issues/44#issuecomment-2011786853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNOJPZXT7FHFNJYZL3G23TYZKVBHAVCNFSM6AAAAAA2L243V2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJRG44DMOBVGM . You are receiving this because you commented.Message ID: @.***>

HassanJbara commented 8 months ago

Ah, I see the config now. Thanks. Did you ever figure out why you were getting different results for predicting noise vs predicting latents? What were the reasons for your issue back then?

Not sure, it's probably because of the task I was trying to teach the model. At the end of the day predicting original latents is also valid and it worked so I went with it.

GuHuangAI commented 8 months ago

@HassanJbara Hello, I use the DiT architecture to train the image generation, but the image looks like this. Could you please give me some advice? model-3_sample-0 Top 2 are predictions while the bottom 2 gt.

GuHuangAI commented 8 months ago

I have solved it. The reason was that I forgot adding the position encoding.

yeeeeeeii commented 6 months ago

Ah, I see the config now. Thanks. Did you ever figure out why you were getting different results for predicting noise vs predicting latents? What were the reasons for your issue back then? On Thu, Mar 21, 2024 at 2:55 AM Hassan @.> wrote: Hi @HassanJbara https://github.com/HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise? Default is predicting noise, but there's an option to predict original latent in the code somewhere, although I don't remember where exactly at this point. — Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNOJPZXT7FHFNJYZL3G23TYZKVBHAVCNFSM6AAAAAA2L243V2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJRG44DMOBVGM . You are receiving this because you commented.Message ID: @.>

Hi. Could you tell me how to predict the original x latent? I can't find it. Thanks a lot.