questions about stage 1 training

iyuner commented 1 year ago

Hi,

Thank you for sharing your code. As mentioned in your paper, there are 2 stages of training. First for the denoising diffusion model, and second focuses on the leapfrog initializer. It seems the repo provides the code of stage 2 training, which loaded the pretraining checkpoint of the denoising diffusion model directly. Could you also provide the code for stage 1 training? Do you use the leapfrog initializer in the first stage? If so, what are the initialized values of the estimated mean, variance, and sample prediction you used? Thanks!

Frank-Star-fn commented 8 months ago

I also have the same requirement and hope to obtain the code for the first stage of training.

kkk00714 commented 6 months ago

I have reimplemented the stage 1 training. If you are still interesting in it, please concat me.

kkk00714 commented 6 months ago

I have reimplemented the stage 1 training. If you are still interesting in it, please concat me.

fangzl123 commented 6 months ago

Hi, have you successfully trained stage 1? I've reimplemented it but when I trained the model, the noise estimation loss will always stuck around 1.0. Thanks for any insights.

kkk00714 commented 6 months ago

Try to change the fut_traj size to (b,1,2) to train, and save the model to train again with size(b,T,2) and batchsize 250. The loss will stuck around 0.12 which is closed to 0.06 of pretrained model.

ShaokangHi commented 6 months ago

Hi, do you mean to implement stage 1 (pre-train model) only need to change the shape type? @woyoudian2gou

kkk00714 commented 6 months ago

Yes, if you follow the steps I described above, you will get a model that is close to pre-train model. Time step T and batchsize are both factors that affect training.

ShaokangHi commented 6 months ago

@woyoudian2gou Thank you for your reply. It will change config(cfg):

past_frames : 29
future_frames: 1
min_past_frames: 29
min_future_frames: 1

Also change these related parameters in ./trainer/train_led_trajectory_augment_input.py, right?

Looking forward to your reply! or could you share your related code via Google drive or other cloud? Thanks in advance!

kkk00714 commented 6 months ago

No, you should change the shape of fut_traj like Loss_NE(past_traj,fut_traj[:,0,:].unsqueeze(1),traj_mask). And once you have trained this model, change the batchsize to 250 and use origin shape of fut_traj to continue training.

packer-c commented 6 months ago

@woyoudian2gou Hello, What does Loss_NE( ) mean? Where can I find this code? thanks.

13629281511 commented 6 months ago

Hello, i am confused about how to reimplenment the stage 1 training, would you please leave me a contact information?

percybuttons commented 2 months ago

Hi,

Thank you very much for sharing your unique training method. I find it very interesting! However, I have a few questions for clarification:

You mentioned using fut_traj[:,0,:] during the first training of the diffusion module. Does this mean that only the first frame of fut_traj is used?
If so, how many epochs are required for the first training of the diffusion model?
For the second training of the diffusion model, when the full fut_traj is used, how many epochs are required?
Is the learning rate set the same as mentioned in the paper? I am looking forward to your response and appreciate your @kkk00714 Best regards

kkk00714 commented 2 months ago

Yes. 2&3&4. Use hyperparameters as same as the paper mentioned and change batchsize to 250 in stage 1

percybuttons commented 2 months ago

Thank you very much for your response! It resolved my issue and provided immense help. Your suggestion to adjust the batch size from 10 to 250, processing 250*11 agents at a time, is indeed a sensible configuration for a diffusion model. However, my hardware might not support running such a large volume of data simultaneously. I will attempt to use a slightly smaller batch size. Once again, I appreciate your reply! @kkk00714

kkk00714 commented 2 months ago

I hope you can successfully replicate the stage 1 training process. The inspiration for changing the size of future_traj from (b, T, 2) to (b, 1,2) came from the fact that I found that the noise value of the same sample was almost exactly the same at all 30 time steps. I hope this will help you with your subsequent adjustments.

percybuttons commented 2 months ago

Thank you for your kind words and positive outlook! It truly is an intriguing finding, and your ability to implement it effectively showcases your talent. This discovery may not be coincidental at all; it's possible that this approach could be universally applied across diffusion models to yield even better performing ones. Once again, I appreciate your response and insight—it's invaluable for further advancements.

VanHelen commented 2 weeks ago

Hello, I would like to know how you implemented the first stage of denoising training. Did you use the LED module in the first stage of training? Thank you very much!

kkk00714 commented 2 weeks ago

As the paper described, LED module is not used in the first training stage. You just need to use the loss_ne function that comes with the author's code to train fut_traj after changing its shape as I described earlier.

MediaBrain-SJTU / LED

questions about stage 1 training #6