About this project
-
The United Unity Universe (U3) is a framework developed for building and iterating on reinforcement learning environments in Unity 3D, utilizing both Unity and Python. Informed by the DeepMind paper, "Using Unity to Help Solve Intelligence," the design choices of U3 are influenced by the conventions presented therein. With a target audience of machine learning researchers, U3 offers interfaces to facilitate the efficient integration of diverse code for the purpose of implementing new environments and designing experiments to test the boundaries of an agent's capabilities. U3 also includes pre-defined environments such as GridWorld and OpenXLand, and further encourages researchers to contribute their own custom environments or modifications to the existing codebase.
-
The first goal for the project is to recapitulate XLand and Ada. This step will serve as a benchmark for making sure everything is running smoothly. Further, having a working version of Ada will allow us to test its capabilities. Specifically, I am interested in using the Ada model as a foundational model in novel tasks, and testing its zero or few shot capabilities. For example, can Ada adapt to a reskinned environment (just changing textures/models)? Can it adapt to a new set of rules/objects? Can it be used as a foundational model on new rules/objects? Can it then adapt to more complex tasks that require a combination of rules (Like using an object to build a bridge across a gap)? Finally, can Ada be used as a foundational model for a completely different environment as long as the basic controller is the same (new tasks, new visuals, new rewards)?
More details: https://docs.google.com/document/d/1iQD3saJDTD4fPgFONWtWWoQOIJ4glgURXhtg5951-VQ/edit
Desiderata
- Dynamic environment setup and probing: ML researchers need the ability to customize and probe environments to evaluate task difficulty and select environments appropriate for the current policy. U3 enables complete environmental customization from the python side before starting an episode, allowing researchers to find or create an appropriate environment instance for training.
- Dynamic environment loading: Saving and loading environments during testing phases or environmental setup can be useful for debugging and improving models. The U3 framework facilitates easy serialization of environments, with a focus on interpretability rather than speed. This framework also facilitates experimentation for testing how the trained agent reacts to certain situations in the environment.
- Multiple independent environments in a single Unity instance: Running many instances of environments in the same Unity process reduces the cost of the Unity engine overhead. Our framework also allows for a single Unity instance to run distinct environments (such as a Gridworld and a 3D environment) simultaneously.
- Multiple agents in a single environment: Multi-agent environments are useful both for multi-agent research but also for modifying task difficulty. U3 supports multiple agents in a single environment using the petting-zoo interface. The decision interval for each agent is not fixed, allowing for greater flexibility in training models.
- Python interface using Docker, gym/petting zoo, and a simple API for environmental manipulations: U3 is designed to be a community tool, and as such, it uses existing standards for all interfaces into the framework. The environmental API is environment-specific, but the basic functionality such as serialization is done through a standardized JSON format. Docker is used for the Unity instances to increase reproducibility and facilitate easier setup.
- Modular code design: To make U3 accessible to as many researchers as possible, the framework encourages modular code using Unity components. U3 comes with several basic environments already defined, but users can create new objects and add new features to those objects using components. This means that community code can be mixed and matched to fit the unique requirements of each project with minimal coding.
- Tools for environmental initialization: The ability to randomize environment layouts is important for creating diverse and challenging environments for training models. U3 provides a number of tools for environmental initialization, such as wave function collapse and compositional pattern-producing networks.
- Tools for human experiments: In order to compare models to human-level baseline, or to gather expert trajectories from human players U3 provides a simple human interface both as a standalone application and as a web plugin.
Implementation Details
The project is split into Unity and Python sides. Unity deals with the code to set up RL environments within the Unity game engine itself. Python deals with wrappers for interfacing with the environment during training. This entails a PettingZoo wrapper and extra API calls that enable U3 specific functionality (such as probing environment complexity).