I test ran the code in google colab and so far I got output as following
2024-02-06 18:09:14,584 - INFO - Log directory: data/model/dcrnn_DR_2_h_12_64-64_lr_0.01_bs_192_0206180913/
INFO:model.pytorch.dcrnn_supervisor:Log directory: data/model/dcrnn_DR_2_h_12_64-64_lr_0.01_bs_192_0206180913/
2024-02-06 18:09:35,626 - INFO - Model created
INFO:model.pytorch.dcrnn_supervisor:Model created
2024-02-06 18:09:38,948 - INFO - Loaded model at 50
INFO:model.pytorch.dcrnn_supervisor:Loaded model at 50
2024-02-06 18:09:40,199 - INFO - Start training ...
INFO:model.pytorch.dcrnn_supervisor:Start training ...
2024-02-06 18:09:40,204 - INFO - num_batches:125
INFO:model.pytorch.dcrnn_supervisor:num_batches:125
2024-02-06 18:18:39,040 - INFO - epoch complete
INFO:model.pytorch.dcrnn_supervisor:epoch complete
2024-02-06 18:18:39,045 - INFO - evaluating now!
INFO:model.pytorch.dcrnn_supervisor:evaluating now!
2024-02-06 18:19:24,359 - INFO - Epoch [50/100] (6375) train_mae: 1.9753, val_mae: 2.9198, lr: 0.010000, 584.1s
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:432: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
warnings.warn("To get the last learning rate computed by the scheduler, "
INFO:model.pytorch.dcrnn_supervisor:Epoch [50/100] (6375) train_mae: 1.9753, val_mae: 2.9198, lr: 0.010000, 584.1s
2024-02-06 18:19:24,384 - INFO - Saved model at 50
INFO:model.pytorch.dcrnn_supervisor:Saved model at 50
2024-02-06 18:19:24,391 - INFO - Val loss decrease from inf to 2.9198, saving to models/epo50.tar
INFO:model.pytorch.dcrnn_supervisor:Val loss decrease from inf to 2.9198, saving to models/epo50.tar
2024-02-06 18:28:27,688 - INFO - epoch complete
INFO:model.pytorch.dcrnn_supervisor:epoch complete
2024-02-06 18:28:27,692 - INFO - evaluating now!
INFO:model.pytorch.dcrnn_supervisor:evaluating now!
...
2024-02-06 21:27:25,031 - INFO - Epoch [69/100] (8750) train_mae: 1.9429, val_mae: 2.9616, lr: 0.000100, 589.0s
INFO:model.pytorch.dcrnn_supervisor:Epoch [69/100] (8750) train_mae: 1.9429, val_mae: 2.9616, lr: 0.000100, 589.0s
2024-02-06 21:28:55,730 - INFO - Epoch [69/100] (8750) train_mae: 1.9429, test_mae: 3.2499, lr: 0.000100, 589.0s
INFO:model.pytorch.dcrnn_supervisor:Epoch [69/100] (8750) train_mae: 1.9429, test_mae: 3.2499, lr: 0.000100, 589.0s
2024-02-06 21:37:59,490 - INFO - epoch complete
INFO:model.pytorch.dcrnn_supervisor:epoch complete
2024-02-06 21:37:59,494 - INFO - evaluating now!
INFO:model.pytorch.dcrnn_supervisor:evaluating now!
2024-02-06 21:38:44,803 - INFO - Epoch [70/100] (8875) train_mae: 1.9318, val_mae: 2.9033, lr: 0.001000, 589.1s
INFO:model.pytorch.dcrnn_supervisor:Epoch [70/100] (8875) train_mae: 1.9318, val_mae: 2.9033, lr: 0.001000, 589.1s
2024-02-06 21:38:44,823 - INFO - Saved model at 70
INFO:model.pytorch.dcrnn_supervisor:Saved model at 70
2024-02-06 21:38:44,827 - INFO - Val loss decrease from 2.9198 to 2.9033, saving to models/epo70.tar
INFO:model.pytorch.dcrnn_supervisor:Val loss decrease from 2.9198 to 2.9033, saving to models/epo70.tar
2024-02-06 21:47:48,164 - INFO - epoch complete
INFO:model.pytorch.dcrnn_supervisor:epoch complete
2024-02-06 21:47:48,169 - INFO - evaluating now!
INFO:model.pytorch.dcrnn_supervisor:evaluating now!
2024-02-06 21:48:33,495 - INFO - Epoch [71/100] (9000) train_mae: 1.9262, val_mae: 2.9057, lr: 0.001000, 588.7s
INFO:model.pytorch.dcrnn_supervisor:Epoch [71/100] (9000) train_mae: 1.9262, val_mae: 2.9057, lr: 0.001000, 588.7s
2024-02-06 21:57:36,690 - INFO - epoch complete
INFO:model.pytorch.dcrnn_supervisor:epoch complete
2024-02-06 21:57:36,698 - INFO - evaluating now!
INFO:model.pytorch.dcrnn_supervisor:evaluating now!
...
2024-02-06 23:57:38,667 - INFO - Epoch [84/100] (10625) train_mae: 1.9336, val_mae: 2.9073, lr: 0.000100, 588.8s
INFO:model.pytorch.dcrnn_supervisor:Epoch [84/100] (10625) train_mae: 1.9336, val_mae: 2.9073, lr: 0.000100, 588.8s
2024-02-07 00:06:42,161 - INFO - epoch complete
INFO:model.pytorch.dcrnn_supervisor:epoch complete
2024-02-07 00:06:42,165 - INFO - evaluating now!
INFO:model.pytorch.dcrnn_supervisor:evaluating now!
2024-02-07 00:07:27,510 - INFO - Epoch [85/100] (10750) train_mae: 1.9430, val_mae: 2.9063, lr: 0.000100, 588.8s
INFO:model.pytorch.dcrnn_supervisor:Epoch [85/100] (10750) train_mae: 1.9430, val_mae: 2.9063, lr: 0.000100, 588.8s
the valid and test result doesn't seems improving and the lr stayed unchanged. Are they expected and they will get better before the 100 epochs? Or something is wrong?
Thanks!
I test ran the code in google colab and so far I got output as following
the valid and test result doesn't seems improving and the lr stayed unchanged. Are they expected and they will get better before the 100 epochs? Or something is wrong? Thanks!