Closed mehrdad78 closed 9 months ago
Hi,
We are sorry that this created confusion to you. Currently we have two tutorials. One is about a DQN-based news recommender system while the other one is a contextual bandit tutorial.
In terms of the FrozenLake environment, I think the quick start code shown in the README file is pretty close to a solution.
One thing important to note here is that you would like to convert each observation, which is an index of the current state, to some representation vector (like a "one-hot vector") for that state. This is because in our design, all states are assumed to be representation vectors instead of indices.
Then you would obtain some code like this, which works in this problem.
from pearl.action_representation_modules.one_hot_action_representation_module import (
OneHotActionTensorRepresentationModule,
)
from pearl.policy_learners.sequential_decision_making.deep_q_learning import (
DeepQLearning,
)
from pearl.replay_buffers.sequential_decision_making.fifo_off_policy_replay_buffer import (
FIFOOffPolicyReplayBuffer,
)
from pearl.utils.instantiations.environments.gym_environment import GymEnvironment
from pearl.pearl_agent import PearlAgent
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
import torch
env = GymEnvironment("FrozenLake-v1", is_slippery=False, desc=generate_random_map(size=3))
num_actions = env.action_space.n
agent = PearlAgent(
policy_learner=DeepQLearning(
state_dim=env.observation_space.n,
action_space=env.action_space,
hidden_dims=[64],
training_rounds=20,
learning_rate=0.01,
action_representation_module=OneHotActionTensorRepresentationModule(
max_number_actions=num_actions
),
),
replay_buffer=FIFOOffPolicyReplayBuffer(10_000),
)
def one_hot_vector(index, num_actions):
# sets the index-th element to 1 and the rest to 0
return torch.zeros(num_actions).scatter_(0, torch.tensor([index]), 1)
for i in range(1000):
observation, action_space = env.reset()
observation_tensor = one_hot_vector(observation, env.observation_space.n)
agent.reset(observation_tensor, action_space)
done = False
rtn = 0
while not done:
action = agent.act(exploit=False)
action_result = env.step(action)
action_result.observation = one_hot_vector(action_result.observation, env.observation_space.n)
agent.observe(action_result)
agent.learn()
done = action_result.done
rtn += action_result.reward
if done:
print(f"Episode: {i}, Return: {rtn} ")
So nice!thank you . If i have any other question about solving problems like that,should comment here or open new issue?
You are welcome. We are glad that you play with the code and provide feedback.
If the new questions are not related to this issue, I suggest starting a new issue.
Please note we have revised the code example to avoid having to mess with numpy and numpy dtypes.
We also introduced a one_hot_vector function to make that operation more evident, and selected a very small environment (size 3) to make execution fast to check.
Please note we have added a FrozenLake Jupyter notebook tutorial that's an even simpler version (using the online_learning
function which automates the learning loop for you).
hi! i want to solve Frozen Lake problem with Pearl.But how can i do that?i do not know how to use Pearl for solving problems.You said Pearl can be used for real-world problems but you did not provide any example for that. Please help me!