Closed piperwolters closed 3 months ago
Thanks for your question.
Note that
bsize
in the configuration files means the number of parallel emulators on each machine. When there are multiple worker machines collecting trajectories parallely, the total number of parallel emulators will be the sum ofbsize
on each machine.
So yes, bsize=8
works perfectly with parallel=single
. It also works with parallel=worker
if you're running on multiple worker machines.
What does the parallel
argument do? Parallel defines multi-machine emulation or not. If you run emulators on a single machine, set parallel: single
. If you run on multiple machines, set parallel: host
for the host machine (that aggregates the trajectories and trains the model) and set parallel: worker
for the worker machines (that run emulators to collect data).
In batch_interact_environment, the agent only gets an action and takes a step if accelerator.is_main_process. I would have thought that an agent in each emulator would want to do this? accelerator
is for multi-gpu training, it has nothing to do with emulation.
Thank you!
One more quick question - if I wanted to test out bsize=8
and parallel=single
, and have the agents take random steps within each of the 8 emulators at the same time, can I expect something like this to work?
import multiprocessing
def rand_steps(emulator):
for _ in range(5):
current_obs = emulator.get_obs()
current_prompt = current_obs['prompt']
current_img = current_obs['image_feature'].unsqueeze(0)
action = agent.get_action(current_prompt, current_img)[0]
screenshot, reward, terminated = emulator.step(action)
if accelerator.is_main_process:
env = construct_env(sample_mode="random")
env.reset()
emulators = env.emulators
processes = []
for emulator in emulators:
p = multiprocessing.Process(target=rand_steps, args=(emulator,))
p.start()
processes.append(p)
for p in processes:
p.join()
or is there a built-in way to do this with your code?
Pardon me, but what do you mean by "take random steps"?
Sorry, not "random", just take steps working towards the current_prompt, like my code is trying to do. I am mostly wondering if the agent takes steps in each of the 8 emulators in parallel. And if so, how to make that happen?
In our implementation, each machine has bsize
emulators. Each emulator is an Android environment to interact with. The agent interacts with each emulator parallelly. Steps are taken sequentially in each emulator. All steps in each emulator construe a trajectory. The illustration below might help you. To clarify, there's no parallelization in each emulator - the parallelization happens because we have multiple emulators, not we take parallel steps in each emulator.
Hi, thank you for open sourcing this work!
I am trying to understand how emulators running in parallel is working. The
bsize
argument specifies how manyAndroidEmulator
emulators should be running in parallel within the oneBatchedAndroidEnv
. Inbatch_interact_environment
, the agent only gets an action and takes a stepif accelerator.is_main_process
. I would have thought that an agent in each emulator would want to do this?Also, what does the
parallel
argument do? Canbsize=8
andparallel=single
work together?