Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.12k stars 4.15k forks source link

Barracuda: ArgumentException: Off-axis dimensions must match #5417

Closed ohernpaul closed 2 years ago

ohernpaul commented 3 years ago

Describe the bug I am trying to use a combination of GridSensor, BufferSensor, RayCast Sensors, and vector observations. Training goes fine without errors or failures, but when I try to load the onnx model into my scene, this error is thrown (not immediately, but when the agent dies or gets into some specific scenarios): ArgumentException: Off-axis dimensions must match Unity.Barracuda.TensorExtensions.Concat (Unity.Barracuda.TensorShape[] shapes, System.Int32 axis) (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/TensorExtensions.cs:337)

I was on barracuda 1.4.0 but upgraded to 2.1.0 and the issue still persisted.

ArgumentException: Off-axis dimensions must match
Unity.Barracuda.TensorExtensions.Concat (Unity.Barracuda.TensorShape[] shapes, System.Int32 axis) (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/TensorExtensions.cs:337)
Unity.Barracuda.ModelAnalyzer.ListTemporaryTensorShapes (Unity.Barracuda.Model model, System.Collections.Generic.IDictionary`2[TKey,TValue] inputShapes, System.Collections.Generic.IDictionary`2[System.String,System.Nullable`1[Unity.Barracuda.TensorShape]]& shapesByName) (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/Backends/ModelAnalyzer.cs:473)
Unity.Barracuda.PrecompiledComputeOps.PrepareModel (Unity.Barracuda.Model model, System.Collections.Generic.IDictionary`2[TKey,TValue] inputShapes) (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/Backends/BarracudaPrecompiledCompute.cs:240)
Unity.Barracuda.StatsOps.PrepareModel (Unity.Barracuda.Model model, System.Collections.Generic.IDictionary`2[TKey,TValue] inputShapes) (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/Backends/StatsOps.cs:66)
Unity.Barracuda.GenericWorker+<StartManualSchedule>d__33.MoveNext () (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/Backends/GenericWorker.cs:217)
Unity.Barracuda.GenericWorker.Execute () (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/Backends/GenericWorker.cs:160)
Unity.Barracuda.GenericWorker.Execute (System.Collections.Generic.IDictionary`2[TKey,TValue] inputs) (at Library/PackageCache/com.unity.barracuda@2.1.0-preview/Barracuda/Runtime/Core/Backends/GenericWorker.cs:145)
Unity.MLAgents.Inference.ModelRunner.DecideBatch () (at Library/PackageCache/com.unity.ml-agents@2.0.0-pre.3/Runtime/Inference/ModelRunner.cs:213)
Unity.MLAgents.Policies.BarracudaPolicy.DecideAction () (at Library/PackageCache/com.unity.ml-agents@2.0.0-pre.3/Runtime/Policies/BarracudaPolicy.cs:125)
Unity.MLAgents.Agent.DecideAction () (at Library/PackageCache/com.unity.ml-agents@2.0.0-pre.3/Runtime/Agent.cs:1360)
Unity.MLAgents.Academy.EnvironmentStep () (at Library/PackageCache/com.unity.ml-agents@2.0.0-pre.3/Runtime/Academy.cs:578)
Unity.MLAgents.AcademyFixedUpdateStepper.FixedUpdate () (at Library/PackageCache/com.unity.ml-agents@2.0.0-pre.3/Runtime/Academy.cs:43)

Environment (please complete the following information):

andrewcoh commented 3 years ago

Hi @ohernpaul

When the agent dies, are you disabling it i.e. calling SetActive(false)? Can you describe the other scenarios where this happens?

chriselion commented 3 years ago

note: edited the original post to use triple-backticks

ohernpaul commented 3 years ago

No I'm not disabling the agents, learned the hard way on that a few months back.

The only bits of additional information I can provide right now is that:

I confirmed that the issues source is grid sensor -> removed all observations from agent and trained for a few thousand steps only with grid sensor.

The issue also happens when I load the model into the agent (inference) and lift the agent up off the ground so that the two detectable tags (road, obstacles) are not visible. The moment the grid has nothing to read it breaks and throws the off axis error.

It's very strange - I tried to reproduce this with the food collector example by training from scratch for 100,000 steps (arbitrary amount). I loaded the model into one of the agents and lifted it off the ground and it did not throw the off axis error.

I have since given up on using the grid sensor even though the version in release 17 is much better than it's ever been (major props).

Want to say that I appreciate what you all have made and that you continue to develop this project so rapidly.

dongruoping commented 3 years ago

hey @ohernpaul this does sounds like a strange issue. Can you share more about your scene and sensor setup? Grid sensor settings (size/tags/compression)? Did you made your own custom grid sensor or you're using the original one? What happens when the agent dies, which might be triggering the error?

Lifting the agent up off the ground or detecting nothing shouldn't be a problem, and it'll be just a zero observation. Does it happen constantly (i.e. when it detects nothing it throws errors)?

I tried playing a bit with the grid sensor but couldn't find a way to reproduce the error. Since this does look concerning, it would be very helpful if you can describe your scene with more details or upload a minimal project/model that can repro the errors, thanks.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.