StephenAshmore / terrarium

0 stars 0 forks source link

On Environments #2

Open StephenAshmore opened 3 years ago

StephenAshmore commented 3 years ago

I believe that we originally designed environments to fit the following constraints/requirements:

  1. Environments should be "drop-in" to the overall architecture. That is to say, we should be able to easily take multiple environments and apply any number of agents without code changes to the environments. Specifically, this took the form of an enforced interface for all environments.
  2. Environments should be self-contained, such that someone could write their own environment without needing to understand existing environments.
  3. Agents should be able to "drop in" to an environment and not need any custom code to work with a specific environment.

I think a major goal of terrarium is out-of-distribution training for agents, where agents and environments have an interface that would be the built-in biases/assumption of distribution that allows the agents to maneuver from one environment to the next while still learning. This has led us down a rabbit hole of trying to figure out the interface for agents, how do they interact with the environment that doesn't lead to poor programming design?

I think this has resulted in each environment having the requirement of designing their own sensors, actuators etc. This has led to the burgeoning environment directories and complexity around the inputs and outputs of an agent that makes it harder to work in the environment instead of easier. I think the question that we need to answer is this:

  1. Can we design an interface for interacting with an environment that does not lead to the need to build every piece of the environment (aka can we reuse tools)?
  2. How can we create that interface to avoid adding new inputs and outputs to an agent as they move from environment to environment?
StephenAshmore commented 3 years ago

In response to question two, I am reminded of industrial robotics where an "agent" is much simpler than we imagine in machine learning. Industrial robots are designed to perform a task over and over again, to certain requirements of precision. When analyzing a problem and designing a robot for an industrial task we consider the Degrees of Freedom that the robot needs. For a simple "pick and place" task, the robot may need only six degrees of freedom: translation on the x and z-axis, height on the y axis, and a clamping mechanism (possibly vacuum suction on/off or close/open a hand). If we considered the problem space of all pick and place tasks where the robot only had these six degrees of freedom, we could design an environment and tool interface that any agent could ascribe to. We would not need to worry about adding another actuator, as we've limited the environments to that many degrees of freedom.

We could do something similar for our environments. Enforce upon each agent that they will have a certain set of sensors for all environments (like a human has only sight, smell, etc), and a certain number of actuators. All environments would need to be able to send data to and receive commands from the agent in the required format. Tools would then be objects that exist in the environment, and the agent need not know about them. The agent could learn to use the tools, or it could ignore them. Tools could also be more easily shared. Further, environments would simply need to map their internal state to a vector for the agent to consume, and then map the agent's command vector to effects in the environment.