Open AlexanderHuels opened 9 months ago
I think I have managed it.
Install Gymnasium if not done
pip install gymnasium
Import libraries.
import gymnasium as gym import numpy as np import time, pickle, os
Initialise the FrozenLake environment.
env = gym.make('FrozenLake-v1', render_mode='ansi', is_slippery=False) env.reset() print(env.render())
Next parts are ok, and as used in the example.
`# Epsilon for an epsilon greedy approach
epsilon = 0.95
total_episodes = 1000
maximum_steps = 100 learning_rate = 0.75
gamma = 0.96
Q = np.zeros((env.observation_space.n, env.action_space.n))
def select_action(state):
if np.random.uniform(0, 1) < epsilon:
# If the random number sampled is smaller than epsilon then a random action is chosen.
action = env.action_space.sample()
# If the random number sampled is greater than epsilon then we choose an action having the maximum value in the Q-table
action = np.argmax(Q[state, :])
return action
def agent_learn(state, state_next, reward, action): predict = Q[state, action] target = reward + gamma np.max(Q[state_next, :]) Q[state, action] = Q[state, action] + learning_rate (target - predict)`
Last but not least: Last block adjusted a little. 'for episode in range(total_episodes): state = env.reset()[0] t = 0
while t < maximum_steps:
action = select_action(state)
print(action, type(action))
state_next, reward, terminated, truncated, info = env.step(action)
agent_learn(state, state_next, reward, action)
state = state_next
t += 1
if terminated or truncated:
with open("QTable_FrozenLake.pkl", 'wb') as f: pickle.dump(Q, f)'
Value Iteration With Frozen Lake does not work.
/opt/conda/lib/python3.10/site-packages/gym/envs/toy_text/ UserWarning: WARN: You are calling render method without specifying any render mode. You can specify the render_mode at initialization, e.g. gym("FrozenLake-v1", render_mode="rgb_array") logger.warn(
ValueError Traceback (most recent call last) Cell In[7], line 10 6 env.render() 8 action = select_action(state)
---> 10 state_next, reward, done, info = env.step(action)
12 agent_learn(state, state_next, reward, action) 14 state = state_next
ValueError: too many values to unpack (expected 4)