Accelerate environment-agent interactions

For training an effective agent, we probably need to explore in the order of 100K to 1M transitions and collect them in the replay memory. Collecting one state in our environment can be expensive as it involves training another network for approx. 1 epoch. This can be especially challenging if we aim to use this approach with large datasets.

Our goal for now is to focus on small datasets to demonstrate the potential of the approach. Thus, we want to make sure that we can run enough experiments to evaluate a few different conditions. To run small scale experiments, we will probably need to squeeze the hardware we have and push it to the limits as much as we can. One of the advantages of Reinforcement Learning is that it can be parallelized in many different ways, and this is something we could exploit. For an example, see this preprint: https://arxiv.org/abs/1507.04296

Here are some ideas that we could implement to improve the throughput of our environment and agent.

Load the entire training set in GPU memory to minimize data exchange from the environment side. Here is a discussion about the topic, seems like the trick is to set workers=0.
We can also try running more than one environment at a time, i.e. environment parallelism. By running n environments simultaneously in different GPUs, each interacting with a copy of the agent, we can collect transitions in the replay memory faster than waiting for a single environment to respond.
Not sure if this makes sense, but perhaps we can store the transitions in disk and reuse them in other experiments later (just using pickle files or something like that). Or we could simulate a bunch of random transitions offline to populate the replay memory as quickly as possible before starting to train the agent.

All this is up for discussions and debate :)

An alternative to making the whole thing faster is to scale down the size of experiments. More specifically, we can use synthetic data to understand the behavior of our solution before running larger scale experiments. Here are some ideas:

Generate toy datasets, such as a 2D mixture of Gaussians (see Fig. 2 in this paper).
Use only one row and column of the images in MNIST (e.g. the center cross) instead of the full image and simplify the network.

Using synthetic data is of great value when studying complex systems from the theoretical point of view, taking the burden of creating scalable systems at the beginning of the project. Synthetic data was used to study RNNs and GANs around 2014 to 2017 before the fundamentals were fully understood to create more complex systems. Even in RL some toy tasks existed before tackling the Atari games or even more complex problems such as Go. We may want to give synthetic data a try.

broadinstitute / AutoTrain

Accelerate environment-agent interactions #11