Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.12k stars 4.15k forks source link

Please restore the GetValueEstimate to the API #4492

Closed Phong13 closed 2 years ago

Phong13 commented 4 years ago

I have been using ML-Agents to train models for a physically simulated character controller. The project makes extensive use of GetValueEstimate() which was removed from the ml-agents API in version 0.11. Please put it back. Each time the project is upgraded I hack ML-Agents to add it back. It is insanely useful. I would not be able to train these models without it.

What is it used for? The CC has five models:

During Inference the GetValueEstimate() is used to decide if the WalkRun model is failing (character is falling over). Then it is used to decide if the "Getup" or "Tumbling" model should be use next (whichever has the higher value estimate). It is used for considering all transitions between all states. I tried doing this with physical measurements and heuristics, but GetValueEstimate works way better and it is so easy!

During training it is used to train transitions to other states. the "JumpAirbourne" model needs to be able to land in a position that will transition well to the "RunWalk" state. A very easy way to do this is to call RunWalkModel.GetValueEstimate(), and reward the JumpAribourne model based on how well the RunWalk model could continue.

The results are spectacular, but these would not be possible without GetValueEstimate()

TransitionsBetweenBrains

Describe the solution you'd like Please put GetValueEstimate back. It is being generated as a side effect of training. It is easy to include it. Please export it from tensorflow and expose it in he API.

Please implement it so that the "policy" and "value_estimate" models can be exported to separate .nn files:

This is important for performance. Executing a model that contains the "value_estimate" takes almost twice as long as executing a model that contains only the "policy". Most of the time I want to query either the "value_estimate" or the "Policy" but rarely both together. It is a huge performance optimization to be able to separate these.

Describe alternatives you've considered The alternatives are very messy.

Very complicated training scenes running extra agents to simulate what the transition-to brain would be doing.

For inference the alternative is to fall back to pre-ML heuristic tricks to try to decide if an action is succeeding or failing. Isn't the point of ML to get away from these very limited techniques?

Train all actions into one giant ML model. This is not appealing because it would take a very long time to train and any changes would require re-training the entire thing and dding new actions would require retraining/re-tuning the whole thing and the reward function training scene would be incredibly complicated.

Another agent could be added to the training scene with a new value-estimator-model. The inputs would be the same observations and the output would be a single float that tries to predict the reward the agent would get. This feels silly given that this model is already being trained as a side effect of PPO. But this approach would be worth the effort if it becomes too hard to hack GetValueEstimate back into the project.

sini commented 4 years ago

I spoke with the author responsible for this original change and he explained that the reason we removed GetValueEstimate is because it's unavailable when using Heuristic / Player control. It would also require a hook into mlagents_envs that is not necessarily useful when using this API directly.

Basically, value estimates are not part of the typical RL loop.

I know we closed your last request on this issue. I'll bring it up for more discussion in our next team meeting.

Phong13 commented 4 years ago

Thanks for considering this.

Please do not underestimate how useful the Value Estimate is when trying to incorporate trained models into a game. Developers turn to RL for solving very hard problems that are extremely difficult to solve using classic programming. The "Value-Estimate" is truly magic! It answers the question "How likely is my trained agent succeed in situation X?". So very useful.

The ML-Agents toolset is great for training, but it is surprisingly difficult to incorporate these models into game systems. The "Value Estimate" would be an amazing gift to game developers.

ruanrothmann commented 4 years ago

Just to concur, the Value Estimate is hugely valuable in a lot of instances, especially for inference and gameplay applications. For instance, if making a versus AI agent like OpenAI 5 or Alphastar I can use it to display the agent's win probability. When selecting between multiple agents for a task, it's very useful to have a 'confidence estimate' of how well each agent thinks it might do. It can be used as a warning system to detect when an agent might be about to fail or succeed at a given task.

Phong13 commented 4 years ago

Some more detail about this request:

Please implement it so that the "policy" and "value_estimate" models can be exported to separate .nn files:

This is important for performance. Executing a model that contains the "value_estimate" takes almost twice as long as executing a model that contains only the "policy". Most of the time I want to query either the "value_estimate" or the "Policy" but rarely both together. It is a huge performance optimization to be able to separate these.

Phong13 commented 3 years ago

Hi Tectonic IP,

It might help to add a comment to the issue on Github, the more support there is for this, the more likely it will get added.

If you are interested. I have modified ML Agents to provide the GetValueEstimate. I could create a fork on Github that has this.

On Sat, Nov 28, 2020 at 9:14 AM Tectonic IP notifications@github.com wrote:

I spoke with the author responsible for this original change and he explained that the reason we removed GetValueEstimate is because it's unavailable when using Heuristic / Player control. It would also require a hook into mlagents_envs that is not necessarily useful when using this API directly.

Basically, value estimates are not part of the typical RL loop.

I know we closed your last request on this issue. I'll bring it up for more discussion in our next team meeting.

Very interested to here of the GetValueEstimate also. Many thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unity-Technologies/ml-agents/issues/4492#issuecomment-735257916, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWVKEPIUA7MJVLV5UG64LTSSEVXLANCNFSM4RPTAYOQ .

-- Ian Deane DigitalOpus http://digitalopus.ca Twitter: @DigitalOpus https://twitter.com/DigitalOpus

TectonicIP commented 3 years ago

Hi Phong13

Thank you for replying and yes I would be really interested if you could forward a fork for modified ML Agents to provide the GetValueEstimate.

I fully support the importance of having the GetValueEstimate() incorporated. Has there been any further development on this @sini ? It would certainly be a game changer.

Many thanks!

Phong13 commented 3 years ago

I will try to get a fork up in the next few days. I am using Release 7. I plan to upgrade to Release 11 when it comes out in a week or so. There will be fewer changes needed then as my branch includes some workarounds for a few bugs.

On Sun, Nov 29, 2020 at 11:19 AM TIP notifications@github.com wrote:

Hi Phong13

Thank you for replying and yes I would be really interested if you could forward a fork for modified ML Agents to provide the GetValueEstimate.

I fully support the importance of having the GetValueEstimate() incorporated. Has there been any further development on this @sini https://github.com/sini ? It would certainly be a game changer.

Many thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unity-Technologies/ml-agents/issues/4492#issuecomment-735441309, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWVKENCGLJ7OLLDL5MOAOLSSKNFDANCNFSM4RPTAYOQ .

-- Ian Deane DigitalOpus http://digitalopus.ca Twitter: @DigitalOpus https://twitter.com/DigitalOpus

TectonicIP commented 3 years ago

Hi @Phong13 how have you progressed? keen to test

Phong13 commented 3 years ago

Thanks for nudging me.

Here is my fork: Phong13/ml-agents

The branch is release_7_branch_with_value_estimate

There are a few other non value estimate changes in there. The main change is model.py to add a TF identifier for the value estimate. Then some changes to export the model.

On the C# side some changes so that the value estimate can be retrieved. I modified BehaviourParameters so that it has three models because most of the time I only need either action or valueEstimate and it is very expensive querying both.

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.