Training loss does not decrease

Layne-Huang / PMDM

MIT License

109 stars 23 forks source link

Training loss does not decrease #15

Open Xinheng-He opened 7 months ago

Xinheng-He commented 7 months ago

Hi there, I'm now reproducing the training code with cross-dock dataset but the training loss is maintained at ~1200 after 50 epochs. I dived into the code and found that in ./models/epsnet/MDM_pocket_coor_shared.py, the F.mse_loss(pos_eq_global + pos_eq_local, target_pos_global + target_pos_local, reduction='none') means the loss between the output of self.net and the input with noise added, but IMP, it maybe the output of self.net and the original input. Any comments from developers?

Layne-Huang commented 7 months ago

Hi,

In diffusion models, you could predict the noise or the x0 to calculate $\mu{\theta}(G_t, t)$. For the details, you could refer to our paper or DDPM..

In practical, the loss of predicting the noise will decrease slowly after the early epochs and it is normal. If you would like to reproduce our results, you have to train the model for 500 epochs.

Xinheng-He commented 7 months ago

Thanks, I'll read it. Will the model be faster to converge when training with the original data rather than noise?

Also, the line 713 in MDM_pocket_coor_shared.py returned loss, loss_pos, loss_pos, loss_node, loss_node to loss in training, but in training, it read them as loss, loss_global, loss_local, loss_node_global, loss_node_local. In fact, loss_global, loss_local are both loss_pos, and loss_node_global, loss_node_local are both loss_node. Maybe here is a minor bug.

Layne-Huang commented 7 months ago

It will be faster if the original data is sparse. You could try it and see if it could converge faster in our dataset! Yes, you are right. We will update the code later. Thanks for your suggestion!

Xinheng-He commented 7 months ago

Thanks, I'll continue trying for it.

Xinheng-He commented 7 months ago

Sorry for bothering you again but I've trained 500 epochs on a toy set including 500 cross-dock examples, I noticed that the loss maintained after 150 epochs and it's still high for positions (around 750). May I know that how much the loss is after 500 epochs of training on cross-docking dataset? Thanks for your help again. 微信图片_20240418112934

Layne-Huang commented 7 months ago

I trained PMDM on the whole dataset for 500 epochs. You do not need to train 500 epochs for the small subset. It is normal for the loss to oscillate in the later stages.