We also want to point out that maximizing training efficiency
on a single machine is equally important for distributed systems. In fact, Sample Factory can be used as a single node in
a distributed setup, where each machine has a sampler and
a learner. The learner computes gradients based on locally
collected experience only, and learners on multiple nodes
can then synchronize their parameter updates after every
training iteration, akin to DD-PPO (Wijmans et al., 2020).
Idee des sample-factory Papers implementieren: