Variable sensors shape - Githubissues

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

17.17k stars 4.16k forks source link

Variable sensors shape #4686

Closed ValerioB88 closed 2 years ago

ValerioB88 commented 3 years ago

Hello. I am trying to create agents that can pass a variable number of observations. One way to do it could be to always compute and pass the maximum number of observations, and then filter them in python, once received. However, when there are VisualObservations, this becomes computationally expensive, and it's an inefficient approach. It would be more appropriate to be able to programmatically chose, at every time-step, what observation to send.

After exploring the code, I think the easy way to do that would be to have an additional attribute in the ISensor class, which would be isActive. Then, in MLAgents.SendInfoToBrain, when calling m_Brain.RequestDecision(m_info, sensors) we would filter for sensors having the isActive to true.

Alternatively, the user may specify from a checkbox in the UnityEditor whether he wants to use variable size observations. If this flag is true, then isActive is checked, otherwise is ignored altogether with very little computational overhead.

I am happy to work on this straight away if approved. Otherwise I will just create my own sensor class.

andrewcoh commented 3 years ago

Hi @ValerioB88

This is a feature that we are planning to add ourselves in the next month or two but we haven't yet decided on the best way to implement it.

Thank you for showing your interest in this feature. Can you share your use case?

ValerioB88 commented 3 years ago

Sure. I am running a computer vision experiment, an object recognition task (not strightly RL). At each iteration I am passing Y frames of X objects. The network is a RNN that learns to recognize these objects. I want the network to be trained on a variable number of frames per object.

I have done some research in the code today and it doesn't seem possible to implement this features by expanding the classes in ml-agents. A perfect point to check whether to pass or not a sensor is in Agents.SendInfoToBrain. My idea was to inherit from Agent and reimplement that function with a check on the sensor object (keeping everything the same -literally copy and pasting everything - but check the sensor in RequestDecision). But that functions is full of private attributes, so I can't just copy it in my inherited agents. I can't even Base.Call() it because otherwise it will send the decision. So my only option is to change Agent itself, which is.. bad. :(

RedTachyon commented 3 years ago

Hi, I was actually really hoping this would exist in ML-Agents. My use case is for multiagent scenarios - I want an agent to observe whatever other agents are nearby, which can vary over time.

My current workaround is just getting all the nearby agents and selecting N closest ones, or padding with NaNs if there's fewer neighbors than maximum, to make sure I don't accidentally use any placeholder values. It's... not ideal, but guess that's what I'm stuck with for now.

RedTachyon commented 3 years ago

Any news on this feature?

andrewcoh commented 3 years ago

Hi @RedTachyon

This feature has been written but unfortunately merging to master is blocked due to a dependency on a Barracuda version that is giving us problems with the export. The pull request is here https://github.com/Unity-Technologies/ml-agents/pull/4909.

If you can share some more details about what you'd like to do with this feature, I may be able to give you some advice on using this branch with the caveat that this is currently a 'use at your own risk' feature.

RedTachyon commented 3 years ago

Do you have an estimate of how long it could take until it's merged? I'm not in an extreme hurry, but that also depends on what's the expected timeline.

My particular use case, partially implemented in my code, is as follows: I create a new ISensor attached to an agent. The sensor at each step of the simulation finds all colliders within a certain radius (Physics.OverlapSphere), filters them to only get objects of a certain type, and then returns the positions of all those objects. At the moment padding/truncating the vector to a fixed size, ideally outputting a vector of size 2*n where n is the number of nearby interesting objects.

RedTachyon commented 3 years ago

Continuing this thread to avoid spam, because I see the PR has been merged.

So I'm not sure if I'm missing something obvious, but it seems that in the newest release, the BufferSensorComponent is internal, which seems to mean that I can't actually access it from my script? I'll probably just copy-paste the implementation or something for now, but you might want to change it to public like the other components.

andrewcoh commented 3 years ago

The PR was merged to master but after the most recent release was cut and so it will be in the March release. You should be able to try it on master though.

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.