czq142857 / IM-NET-pytorch

PyTorch 1.2 implementation of IM-NET.
Other
197 stars 27 forks source link

training loss keeps increasing for SVR task #14

Closed lx709 closed 3 years ago

lx709 commented 3 years ago

Dear authors, Thanks for sharing the code of your wonderful work. I followed the instructions from README.txt to conduct the SVR experiments but found the training loss keeps increasing and then saturates at a large value. Please check the log below. Could you please help to figure out the problem? Thanks.


2021-07-22 09:53:21,611 - Model - INFO - PARAMETER ... 2021-07-22 09:53:21,611 - Model - INFO - Namespace(ae=False, beta1=0.5, checkpoint_dir='checkpoint_v0', data_dir='./data/all_vox256_img/', dataset='all_vox256_img', end=16, epoch=1000, getz=False, iteration=0, learning_rate=5e-05, sample_dir='samples/all_vox256_img1_v0', sample_vox_size=64, start=0, svr=True, train=True) 2021-07-22 09:53:21,611 - Model - INFO -

----------net summary---------- 2021-07-22 09:53:21,611 - Model - INFO - training samples 35019 2021-07-22 09:53:21,611 - Model - INFO - -------------------------------

2021-07-22 09:54:16,795 - Model - INFO - Epoch: [ 0/1000] time: 55.0997, loss: 0.01976688 2021-07-22 09:55:14,028 - Model - INFO - Epoch: [ 1/1000] time: 112.3317, loss: 0.01464148 2021-07-22 09:56:11,519 - Model - INFO - Epoch: [ 2/1000] time: 169.8244, loss: 0.01332228 2021-07-22 09:57:08,826 - Model - INFO - Epoch: [ 3/1000] time: 227.1289, loss: 0.01255551 2021-07-22 09:58:06,521 - Model - INFO - Epoch: [ 4/1000] time: 284.8244, loss: 0.01195736 2021-07-22 09:59:03,822 - Model - INFO - Epoch: [ 5/1000] time: 342.1306, loss: 0.01152219 2021-07-22 10:00:01,491 - Model - INFO - Epoch: [ 6/1000] time: 399.7950, loss: 0.01113618 2021-07-22 10:00:58,809 - Model - INFO - Epoch: [ 7/1000] time: 457.1126, loss: 0.01082771 2021-07-22 10:01:56,299 - Model - INFO - Epoch: [ 8/1000] time: 514.6026, loss: 0.01057385 2021-07-22 10:02:53,894 - Model - INFO - Epoch: [ 9/1000] time: 572.1980, loss: 0.01034075 2021-07-22 10:03:50,362 - Model - INFO - Epoch: [10/1000] time: 628.6637, loss: 0.04968767 2021-07-22 10:04:51,521 - Model - INFO - Epoch: [11/1000] time: 689.7292, loss: 0.05001253 2021-07-22 10:06:41,706 - Model - INFO - Epoch: [12/1000] time: 799.8922, loss: 0.05001420 2021-07-22 10:08:32,360 - Model - INFO - Epoch: [13/1000] time: 910.5690, loss: 0.05001350 2021-07-22 10:10:23,517 - Model - INFO - Epoch: [14/1000] time: 1021.7215, loss: 0.05001152

czq142857 commented 3 years ago

This has not really happened to me before. MSE loss is usually very stable and does not jump from 0.01 to 0.05 like that. Maybe you could try again and see if you can reproduce the issue.

lx709 commented 3 years ago

Thanks for your attention. I found out this is probably because I'm using too large a learning rate for training. A smaller learning rate relieves the problem.