DigiRL-agent / digirl

Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
Apache License 2.0
246 stars 20 forks source link

How are emulators running in parallel? #7

Closed piperwolters closed 3 months ago

piperwolters commented 3 months ago

Hi, thank you for open sourcing this work!

I am trying to understand how emulators running in parallel is working. The bsize argument specifies how many AndroidEmulator emulators should be running in parallel within the one BatchedAndroidEnv. In batch_interact_environment, the agent only gets an action and takes a step if accelerator.is_main_process. I would have thought that an agent in each emulator would want to do this?

Also, what does the parallel argument do? Can bsize=8 and parallel=single work together?

BiEchi commented 3 months ago

Thanks for your question.

  1. Can bsize=8 and parallel=single work together? I've added some notes in the multi-machine setup guide:

Note that bsize in the configuration files means the number of parallel emulators on each machine. When there are multiple worker machines collecting trajectories parallely, the total number of parallel emulators will be the sum of bsize on each machine.

So yes, bsize=8 works perfectly with parallel=single. It also works with parallel=worker if you're running on multiple worker machines.

  1. What does the parallel argument do? Parallel defines multi-machine emulation or not. If you run emulators on a single machine, set parallel: single. If you run on multiple machines, set parallel: host for the host machine (that aggregates the trajectories and trains the model) and set parallel: worker for the worker machines (that run emulators to collect data).

  2. In batch_interact_environment, the agent only gets an action and takes a step if accelerator.is_main_process. I would have thought that an agent in each emulator would want to do this? accelerator is for multi-gpu training, it has nothing to do with emulation.

piperwolters commented 3 months ago

Thank you!

One more quick question - if I wanted to test out bsize=8 and parallel=single, and have the agents take random steps within each of the 8 emulators at the same time, can I expect something like this to work?

    import multiprocessing

    def rand_steps(emulator):
        for _ in range(5):
            current_obs = emulator.get_obs()

            current_prompt = current_obs['prompt']
            current_img = current_obs['image_feature'].unsqueeze(0)

            action = agent.get_action(current_prompt, current_img)[0]

            screenshot, reward, terminated = emulator.step(action)

    if accelerator.is_main_process:
        env = construct_env(sample_mode="random")
        env.reset()
        emulators = env.emulators

    processes = []
    for emulator in emulators:
        p = multiprocessing.Process(target=rand_steps, args=(emulator,))
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

or is there a built-in way to do this with your code?

BiEchi commented 3 months ago

Pardon me, but what do you mean by "take random steps"?

piperwolters commented 3 months ago

Sorry, not "random", just take steps working towards the current_prompt, like my code is trying to do. I am mostly wondering if the agent takes steps in each of the 8 emulators in parallel. And if so, how to make that happen?

BiEchi commented 3 months ago

In our implementation, each machine has bsize emulators. Each emulator is an Android environment to interact with. The agent interacts with each emulator parallelly. Steps are taken sequentially in each emulator. All steps in each emulator construe a trajectory. The illustration below might help you. To clarify, there's no parallelization in each emulator - the parallelization happens because we have multiple emulators, not we take parallel steps in each emulator. ils