Open HassanJbara opened 1 year ago
Hi @HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise?
Hi @HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise?
Default is predicting noise, but there's an option to predict original latent in the code somewhere, although I don't remember where exactly at this point.
Ah, I see the config now. Thanks. Did you ever figure out why you were getting different results for predicting noise vs predicting latents? What were the reasons for your issue back then?
On Thu, Mar 21, 2024 at 2:55 AM Hassan @.***> wrote:
Hi @HassanJbara https://github.com/HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise?
Default is predicting noise, but there's an option to predict original latent in the code somewhere, although I don't remember where exactly at this point.
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/DiT/issues/44#issuecomment-2011786853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNOJPZXT7FHFNJYZL3G23TYZKVBHAVCNFSM6AAAAAA2L243V2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJRG44DMOBVGM . You are receiving this because you commented.Message ID: @.***>
Ah, I see the config now. Thanks. Did you ever figure out why you were getting different results for predicting noise vs predicting latents? What were the reasons for your issue back then?
Not sure, it's probably because of the task I was trying to teach the model. At the end of the day predicting original latents is also valid and it worked so I went with it.
@HassanJbara Hello, I use the DiT architecture to train the image generation, but the image looks like this. Could you please give me some advice? Top 2 are predictions while the bottom 2 gt.
I have solved it. The reason was that I forgot adding the position encoding.
Ah, I see the config now. Thanks. Did you ever figure out why you were getting different results for predicting noise vs predicting latents? What were the reasons for your issue back then? … On Thu, Mar 21, 2024 at 2:55 AM Hassan @.> wrote: Hi @HassanJbara https://github.com/HassanJbara how did you get it to predict the original x latent? I am confused, does the default code do that or does it predict noise? Default is predicting noise, but there's an option to predict original latent in the code somewhere, although I don't remember where exactly at this point. — Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNOJPZXT7FHFNJYZL3G23TYZKVBHAVCNFSM6AAAAAA2L243V2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJRG44DMOBVGM . You are receiving this because you commented.Message ID: @.>
Hi. Could you tell me how to predict the original x latent? I can't find it. Thanks a lot.
Greetings. I'll preface my question with a disclaimer that I don't have much experience in ML and I'm still exploring myself, so I apologize if this question may sound silly or too general.
I'm using this architecture and library to train a model of my own on a certain type of latents. If I set the training goal to predict the noise at each step my model successfully reaches low loss values (~0.15). Yet the samples it produces are nothing like the original. Only setting the goal to predict the original x latent works. I don't understand why that is, maybe you can at least give me an idea to a potential cause or give me an intuition for the problem?
Any help would be very appreciated, thank you.