chnsh / DCRNN_PyTorch

Diffusion Convolutional Recurrent Neural Network Implementation in PyTorch
MIT License
454 stars 114 forks source link

it doesn't seem to improve in the test run #25

Open jerronl opened 9 months ago

jerronl commented 9 months ago

I test ran the code in google colab and so far I got output as following


2024-02-06 18:09:14,584 - INFO - Log directory: data/model/dcrnn_DR_2_h_12_64-64_lr_0.01_bs_192_0206180913/

INFO:model.pytorch.dcrnn_supervisor:Log directory: data/model/dcrnn_DR_2_h_12_64-64_lr_0.01_bs_192_0206180913/

2024-02-06 18:09:35,626 - INFO - Model created

INFO:model.pytorch.dcrnn_supervisor:Model created

2024-02-06 18:09:38,948 - INFO - Loaded model at 50

INFO:model.pytorch.dcrnn_supervisor:Loaded model at 50

2024-02-06 18:09:40,199 - INFO - Start training ...

INFO:model.pytorch.dcrnn_supervisor:Start training ...

2024-02-06 18:09:40,204 - INFO - num_batches:125

INFO:model.pytorch.dcrnn_supervisor:num_batches:125

2024-02-06 18:18:39,040 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 18:18:39,045 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-06 18:19:24,359 - INFO - Epoch [50/100] (6375) train_mae: 1.9753, val_mae: 2.9198, lr: 0.010000, 584.1s

/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:432: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
INFO:model.pytorch.dcrnn_supervisor:Epoch [50/100] (6375) train_mae: 1.9753, val_mae: 2.9198, lr: 0.010000, 584.1s

2024-02-06 18:19:24,384 - INFO - Saved model at 50

INFO:model.pytorch.dcrnn_supervisor:Saved model at 50

2024-02-06 18:19:24,391 - INFO - Val loss decrease from inf to 2.9198, saving to models/epo50.tar

INFO:model.pytorch.dcrnn_supervisor:Val loss decrease from inf to 2.9198, saving to models/epo50.tar

2024-02-06 18:28:27,688 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 18:28:27,692 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!
...
2024-02-06 21:27:25,031 - INFO - Epoch [69/100] (8750) train_mae: 1.9429, val_mae: 2.9616, lr: 0.000100, 589.0s

INFO:model.pytorch.dcrnn_supervisor:Epoch [69/100] (8750) train_mae: 1.9429, val_mae: 2.9616, lr: 0.000100, 589.0s

2024-02-06 21:28:55,730 - INFO - Epoch [69/100] (8750) train_mae: 1.9429, test_mae: 3.2499,  lr: 0.000100, 589.0s

INFO:model.pytorch.dcrnn_supervisor:Epoch [69/100] (8750) train_mae: 1.9429, test_mae: 3.2499,  lr: 0.000100, 589.0s

2024-02-06 21:37:59,490 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 21:37:59,494 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-06 21:38:44,803 - INFO - Epoch [70/100] (8875) train_mae: 1.9318, val_mae: 2.9033, lr: 0.001000, 589.1s

INFO:model.pytorch.dcrnn_supervisor:Epoch [70/100] (8875) train_mae: 1.9318, val_mae: 2.9033, lr: 0.001000, 589.1s

2024-02-06 21:38:44,823 - INFO - Saved model at 70

INFO:model.pytorch.dcrnn_supervisor:Saved model at 70

2024-02-06 21:38:44,827 - INFO - Val loss decrease from 2.9198 to 2.9033, saving to models/epo70.tar

INFO:model.pytorch.dcrnn_supervisor:Val loss decrease from 2.9198 to 2.9033, saving to models/epo70.tar

2024-02-06 21:47:48,164 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 21:47:48,169 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-06 21:48:33,495 - INFO - Epoch [71/100] (9000) train_mae: 1.9262, val_mae: 2.9057, lr: 0.001000, 588.7s

INFO:model.pytorch.dcrnn_supervisor:Epoch [71/100] (9000) train_mae: 1.9262, val_mae: 2.9057, lr: 0.001000, 588.7s

2024-02-06 21:57:36,690 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-06 21:57:36,698 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!
...
2024-02-06 23:57:38,667 - INFO - Epoch [84/100] (10625) train_mae: 1.9336, val_mae: 2.9073, lr: 0.000100, 588.8s

INFO:model.pytorch.dcrnn_supervisor:Epoch [84/100] (10625) train_mae: 1.9336, val_mae: 2.9073, lr: 0.000100, 588.8s

2024-02-07 00:06:42,161 - INFO - epoch complete

INFO:model.pytorch.dcrnn_supervisor:epoch complete

2024-02-07 00:06:42,165 - INFO - evaluating now!

INFO:model.pytorch.dcrnn_supervisor:evaluating now!

2024-02-07 00:07:27,510 - INFO - Epoch [85/100] (10750) train_mae: 1.9430, val_mae: 2.9063, lr: 0.000100, 588.8s

INFO:model.pytorch.dcrnn_supervisor:Epoch [85/100] (10750) train_mae: 1.9430, val_mae: 2.9063, lr: 0.000100, 588.8s

the valid and test result doesn't seems improving and the lr stayed unchanged. Are they expected and they will get better before the 100 epochs? Or something is wrong? Thanks!