Closed slinlee closed 2 years ago
We'll test it in test first. (same as #437)
test
Enhances auto-norm performance esp for multi-agent simulations.
fixes freezing + auto-normalization
ability to disable auto-normalization whilst using reward term interface
sets policy learning rate to 0.0 until PBT mutation (for establishing initial betas)
introduces beta learning rate + beta running average (calculated betas become update target)
beta learning rate schedule
multi-agent reward term contributions aggregated by maximum contributions among agents
beta calculation based on 1 / ( max( abs(term_contrib_min), abs(term_contrib_max) ) )
default beta target set to 1.0
We'll test it in
test
first. (same as #437)Enhances auto-norm performance esp for multi-agent simulations.
fixes freezing + auto-normalization
ability to disable auto-normalization whilst using reward term interface
sets policy learning rate to 0.0 until PBT mutation (for establishing initial betas)
introduces beta learning rate + beta running average (calculated betas become update target)
beta learning rate schedule
multi-agent reward term contributions aggregated by maximum contributions among agents
beta calculation based on 1 / ( max( abs(term_contrib_min), abs(term_contrib_max) ) )
default beta target set to 1.0