Document the structure of the output neural network

Reginhar commented 4 years ago

Is your feature request related to a problem? Please describe. There is no documentation about the structure of the neural networks output by ml-agents. If you visualize the model using Netron, you would expect the neural network to be just a number of layers (Dense) of hidden nodes and activation functions. However, it is much more complicated. All kinds of additional things are happening, for unclear reasons.

Describe the solution you'd like The solution is to have a page of documentation about the structure of the neural networks. For example, it could say "if you have enabled 'normalize' in the config, the model will prepend ... these and these operations ... before the inference calculation." It could also explain the use of the action_masks input, the difference between the action and action_probs output nodes, the difference in the model when choosing discrete vs continuous action spaces, and the calculations done between the last layer and the output node.

This would help to understand what ml-agents is doing, instead of it being a black box. If you understand what ml-agents is doing, it is possible to make informed choices about e.g. parameters and reward functions when training agents. It would also make it easier to help with the development of ml-agents. Additionally, it would help when doing inference outside of Unity. I know that inference outside Unity is not currently supported, but it is an often requested feature and it is very interesting: one could train robot behavior with a Unity model (digital twin) using ml-agents and export this model to the real world.

Describe alternatives you've considered The alternative would be to not make documentation. The inference process will then remain a black box unless you look into the source code of ml-agents. However, looking at the source code takes a lot of time to understand it. Also, the source code most of the time only contains what is happening (e.g. a Div operation is done in the network) and not the reasoning behind it (e.g. this Div operation is here because we are dividing by the running mean, which is part of the normalization step, which we do because ...). Hence by looking at the source code it is in principle possible to know exactly what is happening, but not why it is happening.

Additional context See also the thread on the forum I posted about this. It includes some more explanation and a nice picture.

https://forum.unity.com/threads/structure-of-the-output-neural-network.959304/

harperj commented 4 years ago

Hi @Omniscimus -- there are some tricky things about maintaining a description about the full network architecture. For one, we are updating the internal structure of the model (while keeping the interface the same) regularly during development. Another is that the structure is dynamic based on the behavior and settings.

One tool I think may be useful for you is Netron (https://github.com/lutzroeder/netron) -- which you can use to visualize Tensorflow, Barracuda, or ONNX model structure. Hopefully this will help you to understand the model structure better. With regard to inference outside Unity, we don't have an explicit plan for this at the moment but there are general purpose model serving tools for Tensorflow or ONNX models.

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Document the structure of the output neural network #4441