daisatojp / mpo

PyTorch Implementation of the Maximum a Posteriori Policy Optimisation
GNU General Public License v3.0
72 stars 19 forks source link

How should the hyperparameter alpha be set? #16

Open formoree opened 4 months ago

formoree commented 4 months ago

I am currently running the mpo algorithm in a custom environment, and I often encounter the issue of gradient explosion, manifested by parameters becoming nan after passing through a linear layer. So, I started checking for problems in the hyperparameters settings, and I've been examining the code below, which raised some questions for me.

    parser.add_argument('--alpha_mean_scale', type=float, default=1.0,
                        help='scaling factor of the lagrangian multiplier in the M-step')
    parser.add_argument('--alpha_var_scale', type=float, default=100.0,
                        help='scaling factor of the lagrangian multiplier in the M-step')
    parser.add_argument('--alpha_scale', type=float, default=10.0,
                        help='scaling factor of the lagrangian multiplier in the M-step')
    parser.add_argument('--alpha_mean_max', type=float, default=0.1,
                        help='maximum value of the lagrangian multiplier in the M-step')
    parser.add_argument('--alpha_var_max', type=float, default=10.0,
                        help='maximum value of the lagrangian multiplier in the M-step')
    parser.add_argument('--alpha_max', type=float, default=1.0,
                        help='maximum value of the lagrangian multiplier in the M-step')

Is there a relationship between alpha and its corresponding alpha_max? What was the author's rationale for setting these hyperparameters here?