Closed gal-leibovich closed 5 years ago
Average return over 1 million time steps:
Level | After This Bug Fix | Previous Benchmark Results | DDPG from TD3's Repo |
---|---|---|---|
HalfCheetah | 7500 | 6000 | ~3100 |
Hopper | 2800 | 2500 | ~1750 |
Walker2D | 3100 | 3400 | ~1500 |
Ant | 250 (learning curve does not look good) | 700 (learning curve does not look good) | 900 |
Reacher | -4.3 (WIP) | -4.5 | -6.5 |
Half Cheetah
Hopper
Walker2D
Ant
Reacher
A bug fix for DDPG, where the update to the policy network was based on the sum of the critic's Q predictions on the batch instead of their mean.