Comparison of benchmarking results

JAEarly commented 5 years ago

In the tech report, there are two tables that provide benchmark scores for the different environments, provided by a couple of different algorithms. Whilst this is a good starting point for comparing techniques, it would be useful an online system for comparing results.

A couple of ideas spring to mind:

A results table in the wiki section of this repository. Users would list their scores along with a write up and evidence. This would need clear instructions of the conditions in which to evaluate an agent (episode length, episode repeats, limit on training frames/time etc.).
An automated system for evaluating agents - submit some code that takes actions in an environment based on the presented observations. This would allow for consistent reporting and comparison of results, but is more complex than the first option.

zuoxingdong commented 5 years ago

I think because this repo mainly focusing on providing a Python binding to interact with Mujoco simulator and a bunch of standardized environments. Not really sure if adding such functionality to benchmark different agents is very worthy within this repo. It might be great to have a separate project for that.

alimuldal commented 5 years ago

Our goal is to provide a collection of physics-based reinforcement learning environments and a toolkit for designing new ones. Leaderboards and frameworks for automated agent evaluation are a bit beyond the scope of what we're aiming to do in this repo.

google-deepmind / dm_control

Comparison of benchmarking results #41