Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.93k stars 4.13k forks source link

Question about the curriculum learning curve #3721

Closed davidsonic closed 4 years ago

davidsonic commented 4 years ago

Describe the bug Hi, I was not able to find related questions after searching corresponding issues and am wondering, if the evaluation curve in the document is unfair for comparison between pre-specified curriculum and learning from scratch, since they're evaluated not on a fixed same height (e.g, the maximum height) but rather different heights encountered during training?

I did an experiment that simplified the environment setup by removing one of the brains (BigWallBrain) with a total of 5M training steps and with a more fine-grained curriculum (see below). The plot was also attached (averaged over 2 seeds). From the plot, the auto-curriculum looks great but it was obvious that it was evaluated on different lower walls as opposed to another curve which was always evaluated with wall height of 8. I think it's only fair to say that auto-curriculum is better from scratch if it's also evaluated on 8 always.

The second phenomenon observed from the plot is that learning from scratch actually outperforms the auto one after 3.5M (5M*0.7) steps when the wall height reached beyond 7.5. I think it's necessary to remove the second brain since it only serves as a compounding factor. The fact that auto-curriculum didn't build upon existing skill defeated the purpose of curriculum learning proposed here.

I was trying to fix the first problem myself (not sure if it should be done from the c#script by monitoring the summary_freq parameter or from mlagents EnvManager) and any suggestions would be appreciated. For the second problem, I guess it's why two brains were used at the first place when propagating this feature?

curriculum file SmallWallJump: measure: progress thresholds: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] min_lesson_length: 100 signal_smoothing: true parameters: small_wall_height: [0, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8]

Screenshots wo/ Smoothing: image

w/ Smoothing: image

Environment (please complete the following information):

awjuliani commented 4 years ago

Hi @davidsonic

This is a good point you've raised. The truth is that the example we provide with the curriculum learning documentation is based on a very early version of the Wall Jump environment, where it was indeed necessary for a curriculum. Due to improvements in our learning algorithms and changes in the environment itself, the current WallJump actually no longer needs a curriculum to learn, as you have pointed out here. Instead we use WallJump as a means of demonstrating the ability to switch behaviors within a single agent.

We are currently working on more complex environments that do actually need curriculum, and will share those examples in the future. For now, I will amend the curriculum learning documentation to let folks know that the example we provide there is for instructional purposes, and doesn't correspond to the current WallJump.

awjuliani commented 4 years ago

I have created a PR to let users know about the non-replicability of the documentation: https://github.com/Unity-Technologies/ml-agents/pull/3723. As such, I will close this PR, but please add an additional comment or re-open it if you feel it is necessary.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.