Closed wbrenton closed 1 year ago
Hey,
Not really sure what you are actually asking but the type of environment should not really have much impact on how you use Reverb.
@acassirer Vectorized environments meaning you interact with a batch of environments every time you call .reset()
or .step()
on your environment api.
Here is a motivating example for why I think the question is worth while.
envs = make_envs(num_parallel_env=N, env_id="Breakout-v5")
obs = envs.reset()
print(obs.shape) # (N, 4, 86, 86)
# one trajectory writer for each env
trajectory_writers = [rb_client.trajectory_writer(num_keep_alive_refs=args.rollout_length) for _ in range(N)]
while True:
next_obs, rewards, dones, infos = envs.step(actions)
# next_obs.shape = (N, 4, 86, 86)
# rewards = (N,) # scalar rewards
# loop over every environment and write the experience to it's respective writer
for idx in range(args.num_envs):
trajectory_writer = trajectory_writers[idx]
trajectory_writer.append({
'obs': obs[idx],
'actions': actions[idx],
'rewards': rewards[idx],
'dones': dones[idx]
})
if trajectory_writer.epsiode_steps >= 2:
trajectory_writer.create_item(
table='uniform_experience_replay',
priority=1.,
trajectory={
'obs': trajectory_writer.history['obs'][:-1],
'next_obs': trajectory_writer.history['obs'][-1:],
'actions': trajectory_writer.history['actions'][:-1],
'rewards': trajectory_writer.history['rewards'][:-1],
'dones': trajectory_writer.history['dones'][:-1],
})
Having to iterate over every environment is quite slow and defeats the purpose of using a vectorized environment. Surely there must be a better way, I'm just unable to find it in the codebase.
In case it's still not 100% clear what I'm looking for is a way to write a batch of experiences from N environments to the table without having to maintain a writer for each one of the N environments.
This is a very relevant and common use-case in modern DRL. I also came looking for an answer to this.
Ideally there would be a section in the documentation regarding batched writing of trajectories.
Also, don't understand why this issue was closed. It's clearly not resolved.
This topic is also discussed in this other issue: https://github.com/google-deepmind/reverb/issues/78
@thomasbbrunner glad you replied to this thread, I killed so much time trying to use reverb with vectorized envs. What framework are you using (PyTorch, JAX, etc.)?
I'm using a combination of PyTorch + Numpy. Currently facing problems, as between 50% and 90% of the time in my rollouts is spend on reverb, with the remaining being spent on stepping the environment + metrics.
I tried using multithreading as described in https://github.com/google-deepmind/reverb/issues/72#issuecomment-937541161, however, it did not lead to improvements (prob. limited by the GIL). Multiprocessing is a pain in Python, so prob. not an option (data has to pickleable).
Not sure what to try next. Did you end up finding a solution for your use-case?
What are the best practices for use with a vectorized environment? Any help is appreciated thank you