IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.33k stars 461 forks source link

DDPG Critic Head Bug Fix #344

Closed gal-leibovich closed 5 years ago

gal-leibovich commented 5 years ago

A bug fix for DDPG, where the update to the policy network was based on the sum of the critic's Q predictions on the batch instead of their mean.

gal-leibovich commented 5 years ago

Average return over 1 million time steps:

Level After This Bug Fix Previous Benchmark Results DDPG from TD3's Repo
HalfCheetah 7500 6000 ~3100
Hopper 2800 2500 ~1750
Walker2D 3100 3400 ~1500
Ant 250 (learning curve does not look good) 700 (learning curve does not look good) 900
Reacher -4.3 (WIP) -4.5 -6.5

Half Cheetah image

Hopper image

Walker2D image

Ant image

Reacher image