crowdAI / marLo

Multi Agent Reinforcement Learning using MalmÖ
MIT License
244 stars 46 forks source link

Bugged frames #59

Closed dennissv closed 5 years ago

dennissv commented 5 years ago

I've been training an agent for the first round of the marLo competition which works fine most of the time but some of the times just round around in a circle. After saving the observation from these episodes and looking at them I discovered that during these episodes the observations are bugged and frozen (couple of examples below).

bugged_frame_1_19

bugged_frame_2

When this bug occurs (10-15% of the episodes) all of the frames during that episode will look identical and warped like this, which understandably confuses the agent. However when looking at the Minecraft game window everything looks fine. I've also noted that there is a spike in memory usage whenever this bug appears.

Below is a minimal example to replicate it:

import marlo
from skimage import io
import numpy as np

client_pool = [('127.0.0.1', 10000)]
join_tokens = marlo.make('MarLo-FindTheGoal-v0',
                          params={
                            "client_pool": client_pool
                          })
# As this is a single agent scenario,
# there will just be a single token
assert len(join_tokens) == 1
join_token = join_tokens[0]

env = marlo.init(join_token)

nr_bugged = 0
run = 0
while True:
    steps = 0
    bugged = False
    observation = env.reset()
    while steps < 20: # The frames bug right away after reset so no need to go further
        _action = env.action_space.sample()            
        obs, reward, done, info = env.step(_action)
        if steps > 3: # First couple of frames are equal due to loading
            if np.all(obs == old_obs):
                io.imsave('bugged_frame_%d_%d.png' % (run, steps), obs)
                bugged = True
        old_obs = obs
        steps += 1
    run += 1
    nr_bugged += bugged
    print('Completed %d run(s), %d of which were bugged (%.2f%%)'
          % (run, nr_bugged, (nr_bugged/run)*100))
env.close()

I've noticed that the frames are more often bugged after the first reset and then it becomes slightly more uncommon the more episodes are run. So while it might not affect training too much (although I suspect still quite a bit) I wonder if this bug is also present on the evaluator that's used for the leaderboard? In that case it could heavily impact the scores.

I'm using the latest master branch of marLo. I've also tried adding some time.sleep before/after every action in case it was some sync issue but that doesn't seem to help.

AndKram commented 5 years ago

What OS are you using? Is it possible that the processor or memory are maxed out? While I see duplicate observations occasionally I did not see any corrupted images. There is a suggestion to add an initial warm up episode (one reset) before evaluation for other reasons which would also help with any start up issues.

dennissv commented 5 years ago

I'm on Ubuntu 16.04. But yes it does seem to be a memory issue because I can't reproduce it on servers with more memory. On my laptop (where I originally encountered this problem) I normally have ~2-3GB memory to spare when running everything and most of the episodes are fine but during these episodes I guess it has trouble allocating some memory and fails like this. Thanks for looking into it.