DDPG Critic Head Bug Fix

IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

https://intellabs.github.io/coach/

Apache License 2.0

2.33k stars 461 forks source link

Closed gal-leibovich closed 5 years ago

gal-leibovich commented 5 years ago

A bug fix for DDPG, where the update to the policy network was based on the sum of the critic's Q predictions on the batch instead of their mean.

gal-leibovich commented 5 years ago

Average return over 1 million time steps:

Level	After This Bug Fix	Previous Benchmark Results	DDPG from TD3's Repo
HalfCheetah	7500	6000	~3100
Hopper	2800	2500	~1750
Walker2D	3100	3400	~1500
Ant	250 (learning curve does not look good)	700 (learning curve does not look good)	900
Reacher	-4.3 (WIP)	-4.5	-6.5

Half Cheetah

Hopper

Walker2D

Ant

Reacher