Open pchalasani opened 1 year ago
Hello,
can potentially help improve training stability in DRL
do you have experimental results to back this claim?
In the paper linked in the blog post, results are on A2C/DDPG only (which have usually weaker results compared to PPO/TD3/SAC) and they used only 3 random seeds, which is not enough to account for noise in the results.
Torch contrib is also now archived and didn't receive any update for almost 3 years (https://github.com/pytorch/contrib).
EDIT: SWA seems to be directly in pytorch now https://pytorch.org/docs/stable/optim.html#stochastic-weight-averaging
Thanks, I did not know SWA is in main PyTorch. I will look into it. As for empirical evidence, I'll continue experimenting and report back.
🚀 Feature
Stochastic Weight Averaging (SWA) is a recently proposed technique can potentially help improve training stability in DRL. There is now a new implementation in
torchcontrib
. Quoting/paraphrasing from their page:See the PyTorch SWA page for more.
Motivation
SWA might help improve training stability as well as final reward in some DRL scenarios. It may also alleviate sensitivity to random-seed initialization.
Pitch
See above :)
Alternatives
No response
Additional context
See the PyTorch SWA page for more.
Checklist