Reverb Replay Buffers + Vectorized/Batched Environments

I'm trying to use a reverb replay buffer with a batched environment like 'envpool' where the api returns a batch of experience whenever the either .reset or .step is called.

I'm guessing there must be a better way to insert that data into the buffer than to have a writer for each individual environment and iterate over the writers adding their respective batch index of experience experience.

The below is clearly suboptimal and defeats the purpose of using a vectorized environment opposed to many workers executing a single environment.

num_envs = 100
envs = make_envs(num_envs)
writer = [client.writer() for _ in range(num_envs)]
obs = envs.reset()
# obs.shape ==  (100, 3, 86, 86) 100 atari obs

while True:
    next_obs, reward, done, info = envs.step(action)
    # next_obs.shape ==  (100, 3, 86, 86)
    for i, writer in enumerate(writers):
        writer.append({
             'obs': obs[i],
             .....
             }
        obs = next_obs

If there are any examples of working with batched environments and reverb in the codebase or if anyone could provide some direction, I'd greatly appreciate it.

google-deepmind / acme

Reverb Replay Buffers + Vectorized/Batched Environments #300