I am not able to train a similarly performing model after changing all MLP activations to LeakyReLU (because of the observed issue with unchanging loss).
At first, I used the hyperparameters from scripts/run_traj.sh and tried both pooling modules (although the default 2 m neighborhood probably is not sensible for social pooling), getting to validation ADE of ~0.7, and FDE of ~1.5 at epoch ~70 for both eth and zara1, shortly before the discriminator overpowered the generator and the whole model diverged.
While the selection mechanism of the best model looks to be robust against this convergence issue, I would have expected more thorough assurance of losses' trends. It is also pointless to train the model any further after substantial divergence.
I have since experienced with many hyperparameter settings and larger batch size (128 sequences per batch) seems to stabilize the process the most, but I still cannot get below the aforementioned evaluation metrics' values. Even if I manage to train the model for 200-500 epochs with stable GAN losses (and diminishing D_loss_real), the predictions greatly suffer from some kind of directional bias, i.e. trajectories of all pedestrians try to turn to the same heading, which remains constant for different inputs.
I have stopped worrying about ADE and FDE metrics under the setting of N > 1, because of its issues discussed here. Unfortunately, with N=1, the metrics are too noisy... Perhaps I will just take average instead of minimum in evaluate_helper of scripts/evaluate_model.py and also focus on collisions.
Update: I have also tried the (correct) hyperparameters extracted from the pretrained models via scripts/print_args.py. The vast majority of independent training runs also diverges within 50 epochs. Here, I have noticed that weakening down the discriminator by --d_type local helps stabilize the losses. Since that way the discriminator does not capture social interactions in any way, I am now also trying higher --g_steps/--d_steps ratios.
Hi, I have a similar problem. I try to reproduce the results by just running 'train.py' with no changes to the model. However, the ADE and FDE are very large.
I will check my log file to see what happen.
I am not able to train a similarly performing model after changing all MLP activations to LeakyReLU (because of the observed issue with unchanging loss).
At first, I used the hyperparameters from
scripts/run_traj.sh
and tried both pooling modules (although the default 2 m neighborhood probably is not sensible for social pooling), getting to validation ADE of ~0.7, and FDE of ~1.5 at epoch ~70 for botheth
andzara1
, shortly before the discriminator overpowered the generator and the whole model diverged.While the selection mechanism of the best model looks to be robust against this convergence issue, I would have expected more thorough assurance of losses' trends. It is also pointless to train the model any further after substantial divergence.
I have since experienced with many hyperparameter settings and larger batch size (128 sequences per batch) seems to stabilize the process the most, but I still cannot get below the aforementioned evaluation metrics' values. Even if I manage to train the model for 200-500 epochs with stable GAN losses (and diminishing
D_loss_real
), the predictions greatly suffer from some kind of directional bias, i.e. trajectories of all pedestrians try to turn to the same heading, which remains constant for different inputs.I have stopped worrying about ADE and FDE metrics under the setting of N > 1, because of its issues discussed here. Unfortunately, with N=1, the metrics are too noisy... Perhaps I will just take average instead of minimum in
evaluate_helper
ofscripts/evaluate_model.py
and also focus on collisions.Update: I have also tried the (correct) hyperparameters extracted from the pretrained models via
scripts/print_args.py
. The vast majority of independent training runs also diverges within 50 epochs. Here, I have noticed that weakening down the discriminator by--d_type local
helps stabilize the losses. Since that way the discriminator does not capture social interactions in any way, I am now also trying higher--g_steps
/--d_steps
ratios.