Open StephenAshmore opened 3 years ago
In response to question two, I am reminded of industrial robotics where an "agent" is much simpler than we imagine in machine learning. Industrial robots are designed to perform a task over and over again, to certain requirements of precision. When analyzing a problem and designing a robot for an industrial task we consider the Degrees of Freedom that the robot needs. For a simple "pick and place" task, the robot may need only six degrees of freedom: translation on the x and z-axis, height on the y axis, and a clamping mechanism (possibly vacuum suction on/off or close/open a hand). If we considered the problem space of all pick and place tasks where the robot only had these six degrees of freedom, we could design an environment and tool interface that any agent could ascribe to. We would not need to worry about adding another actuator, as we've limited the environments to that many degrees of freedom.
We could do something similar for our environments. Enforce upon each agent that they will have a certain set of sensors for all environments (like a human has only sight, smell, etc), and a certain number of actuators. All environments would need to be able to send data to and receive commands from the agent in the required format. Tools would then be objects that exist in the environment, and the agent need not know about them. The agent could learn to use the tools, or it could ignore them. Tools could also be more easily shared. Further, environments would simply need to map their internal state to a vector for the agent to consume, and then map the agent's command vector to effects in the environment.
I believe that we originally designed environments to fit the following constraints/requirements:
I think a major goal of
terrarium
is out-of-distribution training for agents, where agents and environments have an interface that would be the built-in biases/assumption of distribution that allows the agents to maneuver from one environment to the next while still learning. This has led us down a rabbit hole of trying to figure out the interface for agents, how do they interact with the environment that doesn't lead to poor programming design?I think this has resulted in each environment having the requirement of designing their own sensors, actuators etc. This has led to the burgeoning environment directories and complexity around the inputs and outputs of an agent that makes it harder to work in the environment instead of easier. I think the question that we need to answer is this: