HumanCompatibleAI / population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
MIT License
26 stars 2 forks source link

Video game environment #5

Closed AdamGleave closed 6 years ago

Discordius commented 6 years ago

This should be done now! Check out the latest commit.

AdamGleave commented 6 years ago

Looks good, I haven't had time to train an RL agent to test it but it looks good to me with a random actor, closing this issue. Thanks @xuweijiezds and @Discordius