I am currently running the mpo algorithm in a custom environment, and I often encounter the issue of gradient explosion, manifested by parameters becoming nan after passing through a linear layer. So, I started checking for problems in the hyperparameters settings, and I've been examining the code below, which raised some questions for me.
parser.add_argument('--alpha_mean_scale', type=float, default=1.0,
help='scaling factor of the lagrangian multiplier in the M-step')
parser.add_argument('--alpha_var_scale', type=float, default=100.0,
help='scaling factor of the lagrangian multiplier in the M-step')
parser.add_argument('--alpha_scale', type=float, default=10.0,
help='scaling factor of the lagrangian multiplier in the M-step')
parser.add_argument('--alpha_mean_max', type=float, default=0.1,
help='maximum value of the lagrangian multiplier in the M-step')
parser.add_argument('--alpha_var_max', type=float, default=10.0,
help='maximum value of the lagrangian multiplier in the M-step')
parser.add_argument('--alpha_max', type=float, default=1.0,
help='maximum value of the lagrangian multiplier in the M-step')
Is there a relationship between alpha and its corresponding alpha_max? What was the author's rationale for setting these hyperparameters here?
I am currently running the mpo algorithm in a custom environment, and I often encounter the issue of gradient explosion, manifested by parameters becoming nan after passing through a linear layer. So, I started checking for problems in the hyperparameters settings, and I've been examining the code below, which raised some questions for me.
Is there a relationship between alpha and its corresponding alpha_max? What was the author's rationale for setting these hyperparameters here?