The Brier score in my training is not decreasing while all the losses and the socre seems to be converging to the reported values. The corresponding value after 68 hours of training on A100, with 100GB RAM.
The training hyperparameters are:
--num_workers=32 --train_batch_size=16 --val_batch_size=16 --test_batch_size=16 --max_epoch=50 --pl2pl_radius=150 --time_span=10 --pl2a_radius=50 --a2a_radius=50 --num_t2m_steps=30 --pl2m_radius=150 --a2m_radius=150
Current training status;
Epoch: 32
Training duration: 68 hours
Brier: 0.61 (osciallting near this value since the begining of the training)
MR (k=6): 0.1824
minADE (k=6): 0.751
minFDE (k=6): 1.359
I am also attaching the Tensorboard checkpoint for further analysis. Please let me know if anything else is required.
Checkpoint.zip
The Brier score in my training is not decreasing while all the losses and the socre seems to be converging to the reported values. The corresponding value after 68 hours of training on A100, with 100GB RAM.
The training hyperparameters are: --num_workers=32 --train_batch_size=16 --val_batch_size=16 --test_batch_size=16 --max_epoch=50 --pl2pl_radius=150 --time_span=10 --pl2a_radius=50 --a2a_radius=50 --num_t2m_steps=30 --pl2m_radius=150 --a2m_radius=150
Current training status; Epoch: 32 Training duration: 68 hours
Brier: 0.61 (osciallting near this value since the begining of the training) MR (k=6): 0.1824 minADE (k=6): 0.751 minFDE (k=6): 1.359
I am also attaching the Tensorboard checkpoint for further analysis. Please let me know if anything else is required. Checkpoint.zip