Closed bingykang closed 3 years ago
There are a number of reasons why we chose not to expose a batched API for the writers and readers. The main is that, unlike NN compute the performance of a system like Reverb isn't limited by memory bandwidth and as such batching frames in storage and during I/O is not always advantageous. In fact:
In these cases, we prefer the Writers to handle one "environment" at a time, and perform batching/chunking of data inside Reverb's clients and server. Similarly when we read from the server we perform custom control flow to minimize latency in the sampler and rely on tf.data
to batch data from multiple sampler threads. As we improve the system, the control flow algorithms improve and the performance improves even if the API does not expose batching.
The current suggestion is to have a separate writer for each env, and use tf.data
with interleaving to sample in parallel and batch data. For the latter, the ACME reverb dataset uses our currently recommended best practices. An example of the new TrajectoryWriter is also available in ACME here.
If you have particular performance issues please let us know by filing a separate issue so that we can help you maximize your throughput / minimize latency for your case.
Hi,
From the tutorial, there are only examples for adding trajectory to the
reverb
server from a single environment. However, a more common setting in RL is that we might have aBachedEnv
holding multiple environments (say,n
). In this case, on the actor side, the shape of a tensor would be[n, single_shape]
, but on the learner side, we need a batch of samples to be[T, batch_size, single_shape]
. My question is, what is the best practice for this?