HumanCompatibleAI / population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
MIT License
26 stars 2 forks source link

Implemented Reacher PIRL version #14

Closed Discordius closed 6 years ago