google-deepmind / lab

A customisable 3D platform for agent-based AI research
Other
7.13k stars 1.37k forks source link

'Connection Interrupted' problem #101

Open gchlodzinski opened 6 years ago

gchlodzinski commented 6 years ago

Hi,

I am problem with Lab: the screen shows 'Connection Interrupted' and agent stops esponding to any command (passed via env.step function). Here is how it looks like: https://youtu.be/vL842dKvG0k

Here is how I initialize Lab:

config = {
    'fps': str(60),
    'width': str(320),
    'height': str(240)
}
level = 'seekavoid_arena_01'
env = deepmind_lab.Lab(level, ['RGBD', 'RGB'], config=config, renderer='hardware')

Then during agent evaluation I capture RGB observations and create video files out of it. I create an Lab environment once, single instance but I do it in separate multiprocessing.Process object. And I use Python3 + pip version of the environment. Interesting thing is that ALWAYS exaclty 20 video files out of 100 are having this issue. Remaining 80 are fine. I repeated my video files generation 4 times - always 20 file are broken. Even more interesting thing is that 'Connection innterrupted' occurs ALWAYS at the same subsequent episodes (I use env.reset with seed=0) EVEN when I use different environments (checked Seek Void arena and Stairway to melon). So always 4th episode is broken, then 6th ... and finally 87th.

Please check if there is anything that could fix this problem.

tkoeppe commented 6 years ago

There's a memory bug in the Python 3 implementation I just noticed. Let me fix that first. (The per-module state doesn't work; if you want to work around that, just switch it back to using a single, global state like we use for Python 2.)

gchlodzinski commented 6 years ago

I have switched back to old implementation as you advised but it did not change the outcome - I am still getting a number of broken episodes. And it still happens at exactly the same time (same episodes ends up with the problem). Here are episode numbers when it happens to me: 4, 6, 8, 10, 12, 29, 31, 33, 35, 37, 54, 56, 58, 60, 62, 79, 81, 83, 85, 87

charlesbeattie commented 6 years ago

Are you examining is_running() after each step? It could be that the environment is finishing and not begin reset properly.

gchlodzinski commented 6 years ago

Yes @charlesbeattie, I do. If I did not - I could not get observation after environment step (env.observations() throws exception).

But inspired by your comment I did a few more tests around resetting environment state. And it looks like I found what is causing this issue. Looks like calling env.reset WHILE it is still being reset causes that problem. When I changed my code to reset environment conditionally:

if not env.is_running:
    env.reset(seed=my_seed)

the problem went away. Of course conditioning ability to reset on environment not running is a bit limiting. So could you consider adding condition to accept reset command only after previous reset is finished? Although calling env.reset twice is a result of imperfection/bug in my code I still think adding such a safe guard make sense.

tkoeppe commented 4 years ago

Hm, I think we should in principle accept resets (which are EnvCApi::start calls) at any point, regardless of the current episode state, but I'm not sure. @charlesbeattie, what do you think?

tkoeppe commented 4 years ago

Does this problem still exist? Do you have some self-contained reproduction instructions? In principle, resetting should work at any point, so the workaround shouldn't be necessary.