google-deepmind / acme

A library of reinforcement learning components and agents
Apache License 2.0
3.52k stars 426 forks source link

Reverb Replay Buffers + Vectorized/Batched Environments #300

Open wbrenton opened 1 year ago

wbrenton commented 1 year ago

I'm trying to use a reverb replay buffer with a batched environment like 'envpool' where the api returns a batch of experience whenever the either .reset or .step is called.

I'm guessing there must be a better way to insert that data into the buffer than to have a writer for each individual environment and iterate over the writers adding their respective batch index of experience experience.

The below is clearly suboptimal and defeats the purpose of using a vectorized environment opposed to many workers executing a single environment.

num_envs = 100
envs = make_envs(num_envs)
writer = [client.writer() for _ in range(num_envs)]
obs = envs.reset()
# obs.shape ==  (100, 3, 86, 86) 100 atari obs

while True:
    next_obs, reward, done, info = envs.step(action)
    # next_obs.shape ==  (100, 3, 86, 86)
    for i, writer in enumerate(writers):
        writer.append({
             'obs': obs[i],
             .....
             }
        obs = next_obs

If there are any examples of working with batched environments and reverb in the codebase or if anyone could provide some direction, I'd greatly appreciate it.

ethanluoyc commented 1 year ago

I think currently creating multiple writers is the way to go as reverb does not provide a native way of doing batched append. There were some discussions about supporting batched environments. See https://github.com/deepmind/reverb/issues/52

If you want to instantiate multiple writers, there are some recommended setups for that which allows you to do this concurrently, see https://github.com/deepmind/reverb/issues/78

If you want to do use multiple workers, I think the recommended workflow is to use the launchpad library and the distributed experiment. You should be able to find some examples on how to do that.