Final .nn model and training phase have different behaviours.

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

17.19k stars 4.16k forks source link

Final .nn model and training phase have different behaviours. #2105

Closed Erethan closed 5 years ago

Erethan commented 5 years ago

I trained a network until I've got the behaviour I needed. However, whenever I import the result model into my Agent Brain, the agent takes completely different actions. What could have caused that?

If I run mlagents-learn with --load, during the training, the agent correctly reloads from where it started and it gets back the intended behaviour. The issue only happens during inference.

shihzy commented 5 years ago

CC: @ervteng - do you have any thoughts on this scenario?

ervteng commented 5 years ago

@mantasp

Hi @Erethan, what are your observation space/action space like? Are you using LSTMs? There may be an issue with Barracuda with certain model architectures - this'll help us narrow down the issue.

Erethan commented 5 years ago

My hyperparameters were the same as the Pyramids example, so I don't think I'm using LSTM.

The goal of the agent is to pick up a resource by standing in front of it, and returning to the center of the environment.

Observation space:

Agent position x and z relative to the center (normalized by the environment's boundaries)
Agent velocity x and z (normalized by the maximum velocity)
Agent eulerAngles y (normalized [-1,1])
Whether the agent is carrying a resource (1 or 0)
Resource position relative to the player (normalized by the environment's boundaries)

Action State is discrete 1st int -> Horizontal input [-1, -0.5, 0, 0.5, 1] 2nd int -> Vertical input [-1, -0.5, 0, 0.5, 1]

Behaviour during training: The agent fiddles towards the resource, but quickly picks it up then returns to the center. Behaviour during inference: The agent goes to some specific position and jitters around it.

If it would help you, I could upload&share the project as well as the the model's folder

ervteng commented 5 years ago

Cool, thanks for the info! We'll definitely look into this. Another question - have you tried doing inference using Python and --slow and does it change behavior? There is sometimes differences in environment behavior depending on timescale.

Erethan commented 5 years ago

I haven't tried it. I didn't know you could also do inference with the Python API. Could you point me to a resource on this subject?

Also, I've found this on the Limitations documentation page:

Rendering Speed and Synchronization Currently the speed of the game physics can only be increased to 100x real-time. The Academy also moves in time with FixedUpdate() rather than Update(), so game behavior implemented in Update() may be out of sync with the agent decision making. See Execution Order of Event Functions for more information.

Could this be related to my problem?

EDIT: I understand know that you could do either inference or training with the Python API (mlagents.env VS mlagents-learn). Is that correct? So if I understood correctly, I would need to create an excecution environment then use the instructions detailed here and here. Is there anyway I can do that with the command line instead of writing a Python script?

ervteng commented 5 years ago

Hi @Erethan, you don't need to write any more Python code. Just add the --slow param while running mlagents-learn, and leave off the --train, and the Python code will be run in inference mode.

If you're still having issues with Barracuda, try using the develop-barracuda-0.2.0 branch of this repo. We've fixed some issues with Discrete actions there, and will merge these changes soon.

Erethan commented 5 years ago

--slow seems to have fixed the problem. Thank you!

I preferred a lot to train with --slow. You can run with many more environments (I did with 30x more), and I can understand better what the agents are doing. Loved it!

Is there any setback I should be worried about when training with --slow?

DooblyNoobly commented 5 years ago

Hi @Erethan I am having the same problem. Are you using --slow when training instead of --train?

Erethan commented 5 years ago

Then the issue occurred, I wasn't using --slow.

Turning --slow on is what solved my issue. (Just to be sure; you have to both write --slow and --train)

DooblyNoobly commented 5 years ago

Thanks! I'll give that a go

vincentpierre commented 5 years ago

Thank you for the discussion. We are closing this issue due to inactivity.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.