Closed edran closed 7 years ago
Can you run the binary directly without bazel run
?
As in, by copying over the "library" like suggested in #37?
No, just by invoking the executable directly. It's somewhere below bazel-bin
, have a look.
$ ./bazel-bin/struct_runner
Failed to open library! - ./libdmlab.so
./libdmlab.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/home/edran/lab/doomstruct/ext/lab/bazel-bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 79, in <module>
main()
File "/home/edran/lab/doomstruct/ext/lab/bazel-bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 74, in main
agents[config.agent], envs[config.env])
File "/home/edran/lab/doomstruct/src/interface/spawner.py", line 29, in __init__
dummy_env.setup()
File "/home/edran/lab/doomstruct/src/interface/lab_interface.py", line 42, in setup
'height': str(self.args.height)
RuntimeError: Failed to connect RL API
$ cd bazel-bin
$ ./struct_runner
Level not found: Failed to open file './baselab/game_scripts/dev/test_map.lua'
Traceback (most recent call last):
File "/home/edran/.cache/bazel/_bazel_edran/db1fd2bdda117385e24e1bb01cc56fd6/execroot/lab/bazel-out/local-fastbuild/bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 79, in <module>
main()
File "/home/edran/.cache/bazel/_bazel_edran/db1fd2bdda117385e24e1bb01cc56fd6/execroot/lab/bazel-out/local-fastbuild/bin/struct_runner.runfiles/org_deepmind_lab/dev/main.py", line 74, in main
agents[config.agent], envs[config.env])
File "/home/edran/lab/doomstruct/src/interface/spawner.py", line 29, in __init__
dummy_env.setup()
File "/home/edran/lab/doomstruct/src/interface/lab_interface.py", line 42, in setup
'height': str(self.args.height)
RuntimeError: Invalid levelName flag 'dev/test_map'
I guess most of the paths are harcoded...
Try the other binary below ./bazel-bin/struct_runner.runfiles/...
somewhere. I'm not entirely sure of the top of my head. The current working directory may be relevant.
If I cd into ./bazel-bin/struct_runner.runfiles/org_deepmind_lab
it does indeed seem to spawn processes correctly!
However, how do I then give commands such as --define headless=false
?
The define is a build flag. So you say:
bazel build :struct_runner --define headless=false
bazel-bin/struct_runner.runfiles/org_deepmind_lab/struct_runner
Great, thank you!
Will map creation through the lua api work out of the box if I run things this way, or should I be wary of other such possible issues?
I would hope that it works :-)
Great - going to close the issue then, however it might be worth documenting this stuff somewhere (or to fix bazel ¯\_(ツ)_/¯
).
This issue seems to be closed more than 2 years ago. However I am having the same problem. (Tried the above solution)
I wrote a wrapper around DMLab to make it more like Gym and in each of the processes that I create, I initialize the environment by using this wrapper. It seems that the processes just die after this initialization of the environments.
What should I do? Thanks.
Can you give some details (or, preferably, explore yourself) of how you are running your binaries? Do they have access to the required assets?
I am running it as:
bazel build :python_a3c_agent --define headless=osmesa
./bazel-bin/python_a3c_agent.runfiles/org_deepmind_lab/python_a3c_agent
where python_a3c_agent.py
is the main file that starts all the processes.
(By the way, I am running bazel through a singularity container.)
How do we check if they have an access?
Could you explore the output directory structure a bit, i.e everything below bazel-bin/python_a3c_agent.runfiles/org_deepmind_lab
? Are the expected files anywhere there?
The expected files seem to be around there. I also implemented the multithreaded version and it seems to be working without any problem.
I have found where the dying is happening: resetting DMLab in created processes. When I create an environment inside python_a3c_agent.py
everything is working as expected and I can reset it without any problem. However, inside the processes created by python_a3c_agent.py
, after env = deepmind_lab.Lab(...)
the processes die right after the env.reset()
line.
I found where the problem is happening, however, I have no idea on how to solve it.
I have implemented A3C with multiprocessing (+ pytorch) as opposed to using threads, however
bazel run
seems to break silently and clearly without any visible trace. This is what I do:struct_runner.py
initialises a lab environment, then creates a bunch of processes in which more envs are created. In particular, the silent crash happens when I create a processp
and dop.start()
- it also appears to be non-deterministic with respect to the number of processes I manage to spawn before bazel kills them and quits.I know that @miyosuda has implemented A3C using threads here, however multiprocessing is supported very well by pytorch and it would be a shame to have to deal with threads management.