Open jccaicedo opened 4 years ago
An alternative to making the whole thing faster is to scale down the size of experiments. More specifically, we can use synthetic data to understand the behavior of our solution before running larger scale experiments. Here are some ideas:
Using synthetic data is of great value when studying complex systems from the theoretical point of view, taking the burden of creating scalable systems at the beginning of the project. Synthetic data was used to study RNNs and GANs around 2014 to 2017 before the fundamentals were fully understood to create more complex systems. Even in RL some toy tasks existed before tackling the Atari games or even more complex problems such as Go. We may want to give synthetic data a try.
For training an effective agent, we probably need to explore in the order of 100K to 1M transitions and collect them in the replay memory. Collecting one state in our environment can be expensive as it involves training another network for approx. 1 epoch. This can be especially challenging if we aim to use this approach with large datasets.
Our goal for now is to focus on small datasets to demonstrate the potential of the approach. Thus, we want to make sure that we can run enough experiments to evaluate a few different conditions. To run small scale experiments, we will probably need to squeeze the hardware we have and push it to the limits as much as we can. One of the advantages of Reinforcement Learning is that it can be parallelized in many different ways, and this is something we could exploit. For an example, see this preprint: https://arxiv.org/abs/1507.04296
Here are some ideas that we could implement to improve the throughput of our environment and agent.
workers=0
.n
environments simultaneously in different GPUs, each interacting with a copy of the agent, we can collect transitions in the replay memory faster than waiting for a single environment to respond.All this is up for discussions and debate :)