Frozen lake - Githubissues

mehrdad78 commented 9 months ago

hi! i want to solve Frozen Lake problem with Pearl.But how can i do that?i do not know how to use Pearl for solving problems.You said Pearl can be used for real-world problems but you did not provide any example for that. Please help me!

yiwan-rl commented 9 months ago

Hi,

We are sorry that this created confusion to you. Currently we have two tutorials. One is about a DQN-based news recommender system while the other one is a contextual bandit tutorial.

In terms of the FrozenLake environment, I think the quick start code shown in the README file is pretty close to a solution.

One thing important to note here is that you would like to convert each observation, which is an index of the current state, to some representation vector (like a "one-hot vector") for that state. This is because in our design, all states are assumed to be representation vectors instead of indices.

Then you would obtain some code like this, which works in this problem.

from pearl.action_representation_modules.one_hot_action_representation_module import (
    OneHotActionTensorRepresentationModule,
)
from pearl.policy_learners.sequential_decision_making.deep_q_learning import (
    DeepQLearning,
)
from pearl.replay_buffers.sequential_decision_making.fifo_off_policy_replay_buffer import (
    FIFOOffPolicyReplayBuffer,
)
from pearl.utils.instantiations.environments.gym_environment import GymEnvironment
from pearl.pearl_agent import PearlAgent
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
import torch
env = GymEnvironment("FrozenLake-v1", is_slippery=False, desc=generate_random_map(size=3))

num_actions = env.action_space.n
agent = PearlAgent(
    policy_learner=DeepQLearning(
        state_dim=env.observation_space.n,
        action_space=env.action_space,
        hidden_dims=[64],
        training_rounds=20,
        learning_rate=0.01,
        action_representation_module=OneHotActionTensorRepresentationModule(
            max_number_actions=num_actions
        ),
    ),
    replay_buffer=FIFOOffPolicyReplayBuffer(10_000),
)

def one_hot_vector(index, num_actions):
    # sets the index-th element to 1 and the rest to 0
    return torch.zeros(num_actions).scatter_(0, torch.tensor([index]), 1)

for i in range(1000):
    observation, action_space = env.reset()
    observation_tensor = one_hot_vector(observation, env.observation_space.n)
    agent.reset(observation_tensor, action_space)
    done = False
    rtn = 0
    while not done:
        action = agent.act(exploit=False)
        action_result = env.step(action)
        action_result.observation = one_hot_vector(action_result.observation, env.observation_space.n)
        agent.observe(action_result)
        agent.learn()
        done = action_result.done
        rtn += action_result.reward
        if done:
            print(f"Episode: {i}, Return: {rtn} ")

mehrdad78 commented 9 months ago

So nice!thank you . If i have any other question about solving problems like that,should comment here or open new issue?

yiwan-rl commented 9 months ago

You are welcome. We are glad that you play with the code and provide feedback.

If the new questions are not related to this issue, I suggest starting a new issue.

rodrigodesalvobraz commented 9 months ago

Please note we have revised the code example to avoid having to mess with numpy and numpy dtypes.

We also introduced a one_hot_vector function to make that operation more evident, and selected a very small environment (size 3) to make execution fast to check.

rodrigodesalvobraz commented 8 months ago

Please note we have added a FrozenLake Jupyter notebook tutorial that's an even simpler version (using the online_learning function which automates the learning loop for you).

facebookresearch / Pearl

Frozen lake #59