HumanCompatibleAI / population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
MIT License
26 stars 2 forks source link

Implement MaxEnt IRL #2

Closed AdamGleave closed 6 years ago