google-deepmind / lab

A customisable 3D platform for agent-based AI research
Other
7.11k stars 1.37k forks source link

Lab + python multiprocessing #53

Closed edran closed 7 years ago

edran commented 7 years ago

I have implemented A3C with multiprocessing (+ pytorch) as opposed to using threads, however bazel run seems to break silently and clearly without any visible trace. This is what I do:

$ bazel run :struct_runner --define headless=false
[...]
$ echo $?  # this is the error code of the previous process
8

struct_runner.py initialises a lab environment, then creates a bunch of processes in which more envs are created. In particular, the silent crash happens when I create a process p and do p.start() - it also appears to be non-deterministic with respect to the number of processes I manage to spawn before bazel kills them and quits.

I know that @miyosuda has implemented A3C using threads here, however multiprocessing is supported very well by pytorch and it would be a shame to have to deal with threads management.

tkoeppe commented 7 years ago

Can you run the binary directly without bazel run?

edran commented 7 years ago

As in, by copying over the "library" like suggested in #37?

tkoeppe commented 7 years ago

No, just by invoking the executable directly. It's somewhere below bazel-bin, have a look.

edran commented 7 years ago
$ ./bazel-bin/struct_runner
Failed to open library! - ./libdmlab.so
./libdmlab.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/home/edran/lab/doomstruct/ext/lab/bazel-bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 79, in <module>
    main()
  File "/home/edran/lab/doomstruct/ext/lab/bazel-bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 74, in main
    agents[config.agent], envs[config.env])
  File "/home/edran/lab/doomstruct/src/interface/spawner.py", line 29, in __init__
    dummy_env.setup()
  File "/home/edran/lab/doomstruct/src/interface/lab_interface.py", line 42, in setup
    'height': str(self.args.height)
RuntimeError: Failed to connect RL API

$ cd bazel-bin
$ ./struct_runner
Level not found: Failed to open file './baselab/game_scripts/dev/test_map.lua'
Traceback (most recent call last):
  File "/home/edran/.cache/bazel/_bazel_edran/db1fd2bdda117385e24e1bb01cc56fd6/execroot/lab/bazel-out/local-fastbuild/bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 79, in <module>
    main()
  File "/home/edran/.cache/bazel/_bazel_edran/db1fd2bdda117385e24e1bb01cc56fd6/execroot/lab/bazel-out/local-fastbuild/bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 74, in main
    agents[config.agent], envs[config.env])
  File "/home/edran/lab/doomstruct/src/interface/spawner.py", line 29, in __init__
    dummy_env.setup()
  File "/home/edran/lab/doomstruct/src/interface/lab_interface.py", line 42, in setup
    'height': str(self.args.height)
RuntimeError: Invalid levelName flag 'dev/test_map'

I guess most of the paths are harcoded...

tkoeppe commented 7 years ago

Try the other binary below ./bazel-bin/struct_runner.runfiles/... somewhere. I'm not entirely sure of the top of my head. The current working directory may be relevant.

edran commented 7 years ago

If I cd into ./bazel-bin/struct_runner.runfiles/org_deepmind_lab it does indeed seem to spawn processes correctly!

However, how do I then give commands such as --define headless=false?

tkoeppe commented 7 years ago

The define is a build flag. So you say:

edran commented 7 years ago

Great, thank you!

Will map creation through the lua api work out of the box if I run things this way, or should I be wary of other such possible issues?

tkoeppe commented 7 years ago

I would hope that it works :-)

edran commented 7 years ago

Great - going to close the issue then, however it might be worth documenting this stuff somewhere (or to fix bazel ¯\_(ツ)_/¯ ).

alversafa commented 4 years ago

This issue seems to be closed more than 2 years ago. However I am having the same problem. (Tried the above solution)

I wrote a wrapper around DMLab to make it more like Gym and in each of the processes that I create, I initialize the environment by using this wrapper. It seems that the processes just die after this initialization of the environments.

What should I do? Thanks.

tkoeppe commented 4 years ago

Can you give some details (or, preferably, explore yourself) of how you are running your binaries? Do they have access to the required assets?

alversafa commented 4 years ago

I am running it as:

bazel build :python_a3c_agent --define headless=osmesa
./bazel-bin/python_a3c_agent.runfiles/org_deepmind_lab/python_a3c_agent

where python_a3c_agent.py is the main file that starts all the processes.

(By the way, I am running bazel through a singularity container.)

How do we check if they have an access?

tkoeppe commented 4 years ago

Could you explore the output directory structure a bit, i.e everything below bazel-bin/python_a3c_agent.runfiles/org_deepmind_lab? Are the expected files anywhere there?

alversafa commented 4 years ago

The expected files seem to be around there. I also implemented the multithreaded version and it seems to be working without any problem.

I have found where the dying is happening: resetting DMLab in created processes. When I create an environment inside python_a3c_agent.py everything is working as expected and I can reset it without any problem. However, inside the processes created by python_a3c_agent.py, after env = deepmind_lab.Lab(...) the processes die right after the env.reset() line.

I found where the problem is happening, however, I have no idea on how to solve it.