Experiments - Githubissues

javabean68 commented 6 years ago

Hallo Maxim,

your book is awesome. I gave it 5 stars on O'Reilly Safari. I modified something in Chapter04/02_frozenlake_naive and after adding it, it seems to converge:

`class FrozenLakeRewardWrapper(gym.RewardWrapper): def init(self, env): super(FrozenLakeRewardWrapper, self).init(env)

def reward(self, reward):
    if reward == 0:
        return 1
    else:
        return 2`

I don't know actually what happens :-)

How can I then visualize the images / videos which are created? I uncommented the line: env= gym.wrappers.Monitor(env, directory="mon", force=True) and I got some files in mon Folder (e.g. openaigym.episode_batch.0.8090.stats.json) but I have no idea how to play/see them...

Could you give me a tip? Thank you so much! Regards Fabio `

Shmuma commented 6 years ago

Hi Fabio!

thanks for your feedback!

Regarding your question: by changing reward, you're effectively giving the agent more and more reward with every step, which makes it motivated to walk around the frozen lake rather than finding the solution of the problem.

To illustrate, let's consider two episodes (reaching winning goal) in old reward scheme: 0 -> 0 -> 0 -> 0 -> 0 -> 1 0 -> 0 -> 1

If gamma < 1, second episode will give the agent higher reward, which will push it towards reaching terminate state in a shorter time.

With new reward scheme, the same episodes will look like this: 1 -> 1 -> 1 -> 1 -> 1 -> 2 1 -> 1 -> 2

In this case, second episode could give the agent smaller total discounted reward (it depends on actual gamma setting), which will push the agent into totally different objective: walk around keeping episodes as long as possible to get more and more reward.

In regards to getting the images for an environment state, you can call env.render() function, which will return you the array with current game position. What you need to do is to output this position to the screen or into the file. Unfortunately, Monitor class doesn't seem to support capturing such text-based environment at the moment.

javabean68 commented 6 years ago

Hi Maxim I used env.render() before your Advise and...the Agent goes really Forward without reaching the end! Your Explanation is amazing!

I have ordered your book in Amazon as well and I'll give it a super Evaluation also there!

Thank you very much: the first book in this area which tries to explain Things and isn't a mere copy of articles on Internet..

Ciao Fabio

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Experiments #1