Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.19k stars 4.16k forks source link

Tennis agent rewards #2638

Closed wardVD closed 4 years ago

wardVD commented 5 years ago

Hi, I'd like to create a game that is similar to the tennis player game. However, I don't quite understand some of the features in the following script: https://github.com/Unity-Technologies/ml-agents/blob/master/UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/HitWall.cs

L 34: why is lastAgentHit set to 0? When agentB hits the ball and it flies over the net, lastAgentHit is set to 0, indicating that agentA is the last one to hit the ball. Why is this?

L 48: why is the total reward of agentB set to 0? I believe this block of code means that if agentA has hit the ball last, but the ball hits the wall behind him that agent gets a negative reward (although it says 0.1 in the documentation, while here it is 0.01). What I don't understand is why agentB gets a zero score. Say agentA has hit the ball 10 times over the net and agentB 9 times, then agentA will have a score 10*0.1 - 0.01 = 0.99, while agentB will have a score of 0, even though agentB won the game. What am I missing?

L 77-79 and 83-85: what is the difference between these lines, why are they in different if/else blocks?

Thank you very much in advance.

chriselion commented 5 years ago

Hi @wardVD, For https://github.com/Unity-Technologies/ml-agents/blob/493c75bf683d35d512ae6fb57d4a1a332116df15/UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/HitWall.cs#L34 - agreed, that's weird. I'll try to find more historical info on it.

For https://github.com/Unity-Technologies/ml-agents/blob/493c75bf683d35d512ae6fb57d4a1a332116df15/UnitySDK/Assets/ML-Agents/Examples/Tennis/Scripts/HitWall.cs#L75-L87 - If you look at the floorB block below, that also repeats itself. I think in both cases the lastAgentHit logic can just be removed.

I'll do some more digging on the 0 reward and get back to you.

wardVD commented 5 years ago

HI @chriselion,

Thank you for the response. Could it also be possible to report on the logic about the rewards, as I believe it is the core behind the RL.

Thank you.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

chriselion commented 5 years ago

Hi @wardVD, sorry for the delay on this.

The original author of the tennis scene is currently on leave, so it's hard to get a first-hand answer about it. However, the consensus from the rest of the team is that in general it's a hard scene to train, so some of the values might have been hacks to get it to train the first time, and haven't been updated since then.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.