agrimgupta92 / sgan

Code for "Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks", Gupta et al, CVPR 2018
MIT License
813 stars 261 forks source link

Training loss does not change, and validation FDE error is super high #63

Open cuihenggang opened 4 years ago

cuihenggang commented 4 years ago

I am trying to train Social-GAN with the code in the repository, but it looks like the G loss and D loss never change after 1.386 and 0.693.

Also, the validation FDE error is 11.058

Am I doing the training correctly?

$ PYTHONPATH=. python scripts/train.py --restore_from_checkpoint 0
[INFO: train.py:  118]: Initializing train dataset
[INFO: train.py:  120]: Train dataset size: 2692
[INFO: train.py:  121]: Initializing val dataset
[INFO: train.py:  129]: There are 21 iterations per epoch
[INFO: train.py:  153]: Here is the generator:
[INFO: train.py:  154]: TrajectoryGenerator(
  (encoder): Encoder(
    (encoder): LSTM(64, 64)
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
  )
  (decoder): Decoder(
    (decoder): LSTM(64, 128)
    (pool_net): PoolHiddenNet(
      (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
      (mlp_pre_pool): Sequential(
        (0): Linear(in_features=192, out_features=512, bias=True)
        (1): ReLU()
        (2): Linear(in_features=512, out_features=1024, bias=True)
        (3): ReLU()
      )
    )
    (mlp): Sequential(
      (0): Linear(in_features=1152, out_features=1024, bias=True)
      (1): ReLU()
      (2): Linear(in_features=1024, out_features=128, bias=True)
      (3): ReLU()
    )
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
    (hidden2pos): Linear(in_features=128, out_features=2, bias=True)
  )
  (pool_net): PoolHiddenNet(
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
    (mlp_pre_pool): Sequential(
      (0): Linear(in_features=128, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=1024, bias=True)
      (3): ReLU()
    )
  )
  (mlp_decoder_context): Sequential(
    (0): Linear(in_features=1088, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=128, bias=True)
    (3): ReLU()
  )
)
[INFO: train.py:  169]: Here is the discriminator:
[INFO: train.py:  170]: TrajectoryDiscriminator(
  (encoder): Encoder(
    (encoder): LSTM(64, 64)
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
  )
  (real_classifier): Sequential(
    (0): Linear(in_features=64, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=1, bias=True)
    (3): ReLU()
  )
)
[INFO: train.py:  233]: Starting epoch 1
[INFO: train.py:  278]: t = 1 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.326
[INFO: train.py:  280]:   [D] D_total_loss: 1.326
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 6 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.098
[INFO: train.py:  280]:   [D] D_total_loss: 1.098
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 11 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 0.994
[INFO: train.py:  280]:   [D] D_total_loss: 0.994
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.687
[INFO: train.py:  283]:   [G] G_total_loss: 0.687
[INFO: train.py:  233]: Starting epoch 2
[INFO: train.py:  278]: t = 16 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.097
[INFO: train.py:  280]:   [D] D_total_loss: 1.097
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.462
[INFO: train.py:  283]:   [G] G_total_loss: 0.462
[INFO: train.py:  278]: t = 21 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.669
[INFO: train.py:  280]:   [D] D_total_loss: 1.669
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.534
[INFO: train.py:  283]:   [G] G_total_loss: 0.534
[INFO: train.py:  278]: t = 26 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 3
[INFO: train.py:  278]: t = 31 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 36 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 41 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 4
[INFO: train.py:  278]: t = 46 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 51 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 56 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 5
[INFO: train.py:  278]: t = 61 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 66 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 6
[INFO: train.py:  278]: t = 71 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 76 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 81 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 7
[INFO: train.py:  278]: t = 86 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 91 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 96 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 8
[INFO: train.py:  278]: t = 101 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  294]: Checking stats on val ...
[INFO: train.py:  298]: Checking stats on train ...
[INFO: train.py:  305]:   [val] ade: 7.662
[INFO: train.py:  305]:   [val] ade_l: 16.871
[INFO: train.py:  305]:   [val] ade_nl: 14.038
[INFO: train.py:  305]:   [val] d_loss: 1.386
[INFO: train.py:  305]:   [val] fde: 11.058
[INFO: train.py:  305]:   [val] fde_l: 24.348
[INFO: train.py:  305]:   [val] fde_nl: 20.260
[INFO: train.py:  305]:   [val] g_l2_loss_abs: 21.739
[INFO: train.py:  305]:   [val] g_l2_loss_rel: 21.739
[INFO: train.py:  308]:   [train] ade: 7.913
[INFO: train.py:  308]:   [train] ade_l: 16.727
[INFO: train.py:  308]:   [train] ade_nl: 15.018
[INFO: train.py:  308]:   [train] d_loss: 1.386
[INFO: train.py:  308]:   [train] fde: 11.870
[INFO: train.py:  308]:   [train] fde_l: 25.090
[INFO: train.py:  308]:   [train] fde_nl: 22.527
[INFO: train.py:  308]:   [train] g_l2_loss_abs: 22.713
[INFO: train.py:  308]:   [train] g_l2_loss_rel: 22.713
[INFO: train.py:  315]: New low for avg_disp_error
[INFO: train.py:  321]: New low for avg_disp_error_nl
[INFO: train.py:  335]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_with_model.pt
[INFO: train.py:  337]: Done.
[INFO: train.py:  343]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_no_model.pt
[INFO: train.py:  354]: Done.
[INFO: train.py:  278]: t = 106 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 111 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 9
[INFO: train.py:  278]: t = 116 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 121 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 126 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 10
[INFO: train.py:  278]: t = 131 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 136 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 11
[INFO: train.py:  278]: t = 141 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 146 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 151 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 12
[INFO: train.py:  278]: t = 156 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 161 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 166 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 13
[INFO: train.py:  278]: t = 171 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 176 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 181 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 14
[INFO: train.py:  278]: t = 186 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 191 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 196 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 15
[INFO: train.py:  278]: t = 201 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  294]: Checking stats on val ...
[INFO: train.py:  298]: Checking stats on train ...
[INFO: train.py:  305]:   [val] ade: 7.662
[INFO: train.py:  305]:   [val] ade_l: 16.870
[INFO: train.py:  305]:   [val] ade_nl: 14.037
[INFO: train.py:  305]:   [val] d_loss: 1.386
[INFO: train.py:  305]:   [val] fde: 11.058
[INFO: train.py:  305]:   [val] fde_l: 24.348
[INFO: train.py:  305]:   [val] fde_nl: 20.260
[INFO: train.py:  305]:   [val] g_l2_loss_abs: 21.739
[INFO: train.py:  305]:   [val] g_l2_loss_rel: 21.739
[INFO: train.py:  308]:   [train] ade: 7.910
[INFO: train.py:  308]:   [train] ade_l: 16.640
[INFO: train.py:  308]:   [train] ade_nl: 15.079
[INFO: train.py:  308]:   [train] d_loss: 1.386
[INFO: train.py:  308]:   [train] fde: 11.827
[INFO: train.py:  308]:   [train] fde_l: 24.878
[INFO: train.py:  308]:   [train] fde_nl: 22.545
[INFO: train.py:  308]:   [train] g_l2_loss_abs: 22.704
[INFO: train.py:  308]:   [train] g_l2_loss_rel: 22.704
[INFO: train.py:  315]: New low for avg_disp_error
[INFO: train.py:  321]: New low for avg_disp_error_nl
[INFO: train.py:  335]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_with_model.pt
[INFO: train.py:  337]: Done.
[INFO: train.py:  343]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_no_model.pt
[INFO: train.py:  354]: Done.
[INFO: train.py:  278]: t = 206 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 16
[INFO: train.py:  278]: t = 211 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 216 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 221 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 17
[INFO: train.py:  278]: t = 226 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 231 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 236 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 18
[INFO: train.py:  278]: t = 241 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 246 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 251 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 19
[INFO: train.py:  278]: t = 256 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 261 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 266 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 20
[INFO: train.py:  278]: t = 271 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 276 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 21
[INFO: train.py:  278]: t = 281 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 286 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 291 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 22
[INFO: train.py:  278]: t = 296 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 301 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  294]: Checking stats on val ...
[INFO: train.py:  298]: Checking stats on train ...
[INFO: train.py:  305]:   [val] ade: 7.662
[INFO: train.py:  305]:   [val] ade_l: 16.870
[INFO: train.py:  305]:   [val] ade_nl: 14.037
[INFO: train.py:  305]:   [val] d_loss: 1.386
[INFO: train.py:  305]:   [val] fde: 11.058
[INFO: train.py:  305]:   [val] fde_l: 24.348
[INFO: train.py:  305]:   [val] fde_nl: 20.260
[INFO: train.py:  305]:   [val] g_l2_loss_abs: 21.739
[INFO: train.py:  305]:   [val] g_l2_loss_rel: 21.739
[INFO: train.py:  308]:   [train] ade: 7.797
[INFO: train.py:  308]:   [train] ade_l: 16.335
[INFO: train.py:  308]:   [train] ade_nl: 14.918
[INFO: train.py:  308]:   [train] d_loss: 1.386
[INFO: train.py:  308]:   [train] fde: 11.681
[INFO: train.py:  308]:   [train] fde_l: 24.471
[INFO: train.py:  308]:   [train] fde_nl: 22.348
[INFO: train.py:  308]:   [train] g_l2_loss_abs: 22.124
[INFO: train.py:  308]:   [train] g_l2_loss_rel: 22.124
[INFO: train.py:  315]: New low for avg_disp_error
[INFO: train.py:  321]: New low for avg_disp_error_nl
[INFO: train.py:  335]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_with_model.pt
[INFO: train.py:  337]: Done.
ntruongv commented 4 years ago

Hi, I have the same problem. Did you end up fixing this issue? Did training until t = 4200 helps?

cuihenggang commented 4 years ago

No luck :(

ZhoubinXM commented 4 years ago

activation using relu or leakyrelu

munila commented 4 years ago

did you try to use a larger learning rate like e.g. 1e-3? try to reuse hyperparameters from run_traj.sh maybe it will fix your problem

cuihenggang commented 4 years ago

I have finally figured out the issue. You need to train with the run_traj.sh script. The default arguments in train.py don't work. The most important argument is --l2_loss_weight 1 which adds the L2 loss to the generator. Social-GAN needs the L2 loss to train and doesn't work with GAN loss only.

Viozer commented 4 years ago

I have finally figured out the issue. You need to train with the run_traj.sh script. The default arguments in train.py don't work. The most important argument is --l2_loss_weight 1 which adds the L2 loss to the generator. Social-GAN needs the L2 loss to train and doesn't work with GAN loss only.

Hi, I have the same problem. I set --l2_loss_weight to 1. But the G loss and D loss keep unchanged still(1.386 and 0.693 respectively) and l2 loss keeps changing. Do you know how to fix it?