PacktPublishing / Deep-Learning-with-PyTorch-1.x

Deep Learning with PyTorch 1.x, published by Packt
MIT License
41 stars 37 forks source link

Chapter 09 #4

Open AlexanderHuels opened 9 months ago

AlexanderHuels commented 9 months ago

Value Iteration With Frozen Lake does not work.

  1. It run into failure: env = gym.make('FrozenLake-v0'). It says to use v1 instead of v0.
  2. Done. But when running last code, it says:

/opt/conda/lib/python3.10/site-packages/gym/envs/toy_text/frozen_lake.py:271: UserWarning: WARN: You are calling render method without specifying any render mode. You can specify the render_mode at initialization, e.g. gym("FrozenLake-v1", render_mode="rgb_array") logger.warn(

ValueError Traceback (most recent call last) Cell In[7], line 10 6 env.render() 8 action = select_action(state)
---> 10 state_next, reward, done, info = env.step(action)
12 agent_learn(state, state_next, reward, action) 14 state = state_next

ValueError: too many values to unpack (expected 4)

AlexanderHuels commented 9 months ago

I think I have managed it.

Install Gymnasium if not done pip install gymnasium Import libraries. import gymnasium as gym import numpy as np import time, pickle, os Initialise the FrozenLake environment. env = gym.make('FrozenLake-v1', render_mode='ansi', is_slippery=False) env.reset() print(env.render()) Next parts are ok, and as used in the example. `# Epsilon for an epsilon greedy approach epsilon = 0.95 total_episodes = 1000

Maximum number of steps to be run for every episode

maximum_steps = 100 learning_rate = 0.75

The discount factor

gamma = 0.96 Q = np.zeros((env.observation_space.n, env.action_space.n)) def select_action(state): action=0 if np.random.uniform(0, 1) < epsilon:

    # If the random number sampled is smaller than epsilon then a random action is chosen.

    action = env.action_space.sample()
else:
    # If the random number sampled is greater than epsilon then we choose an action having the maximum value in the Q-table
    action = np.argmax(Q[state, :])
return action

def agent_learn(state, state_next, reward, action): predict = Q[state, action] target = reward + gamma np.max(Q[state_next, :]) Q[state, action] = Q[state, action] + learning_rate (target - predict)`

Last but not least: Last block adjusted a little. 'for episode in range(total_episodes): state = env.reset()[0] t = 0

while t < maximum_steps:
    env.render()

    action = select_action(state)  
    print(action, type(action))
    state_next, reward, terminated, truncated, info = env.step(action)  
    print(env.render())

    agent_learn(state, state_next, reward, action)

    state = state_next

    t += 1

    if terminated or truncated:
        break

    time.sleep(0.1)

print(Q)

with open("QTable_FrozenLake.pkl", 'wb') as f: pickle.dump(Q, f)'