facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
MIT License
2.7k stars 165 forks source link

Examples #14

Closed fakerybakery closed 11 months ago

fakerybakery commented 11 months ago

Hi, Will you be providing any examples for real-world implementations of Pearl? Thank you for creating this amazing project!

BillMatrix commented 11 months ago

Hi there, thanks for reaching out! We are working on adding more examples/tutorials after our NeurIPS presentation! Will close this task once the examples are in.

fakerybakery commented 11 months ago

Hi, one question, do you think there's any possibility of using this on LLMs?

fakerybakery commented 11 months ago

Also, will your NeurIPS presentation be available on YouTube or another free online platform?

rodrigodesalvobraz commented 11 months ago

Hi, one question, do you think there's any possibility of using this on LLMs?

In principle, yes, especially if the LLM is implemented in PyTorch.

rodrigodesalvobraz commented 11 months ago

Also, will your NeurIPS presentation be available on YouTube or another free online platform?

Unfortunately it does not look like NeurIPS makes videos available. I am checking to see if slides can be made available.

rodrigodesalvobraz commented 11 months ago

Also, will your NeurIPS presentation be available on YouTube or another free online platform?

We will make the slides available on the Pearl web site next week or son. Thanks.

BillMatrix commented 11 months ago

@fakerybakery wanted to add a bit of clarification on LLM support. We don't officially support LLMs in the current beta version yet but in principle you could build some interim solution. Depends on whether you'd like to make language or token as action space, you would need to integrate huggingface tokenizer or transformer/language representation module in both history and action representation modules. If you need to finetune the representations, you would also need to have these model's parameters tracked by the policy learner.

We will try to add language based action and observation support in the future at some point as well. Hope this helps.

paapu88 commented 11 months ago

Any progress on examples except lots of promises?

BillMatrix commented 11 months ago

@paapu88 Stay tuned. As promised, the first set of tutorials will come out this week.

BillMatrix commented 11 months ago

Our NeurIPS presentation slides are shared now on Repo front page. Please check it out. The first set of examples will be released tomorrow with more coming in January 2024.

paapu88 commented 11 months ago

Excellent, the presentation is very good. Unfortunately, not so many examples there. Below is my poor-mans example (somewhat possibly managed steps 1,2 below) that could serve to illustrate what sort of basic example I was looking for: 1) take some gymnasium or even better some gymnasium-derived environment (like https://github.com/Farama-Foundation/HighwayEnv) 2) optimize the agent with deepQ, save it 3) load the trained agent and run a demo with it in the environment

""" 
copy pasted from 
https://github.com/facebookresearch/Pearl?tab=readme-ov-file#quick-start

with small modifications for training,

NOTE: this environment is such that it is ok to go out of box, only falling pole is penalized.

"""

from pearl.pearl_agent import PearlAgent
from pearl.action_representation_modules.one_hot_action_representation_module import (
    OneHotActionTensorRepresentationModule,
)
from pearl.policy_learners.sequential_decision_making.deep_q_learning import (
    DeepQLearning,
)
from pearl.replay_buffers.sequential_decision_making.fifo_off_policy_replay_buffer import (
    FIFOOffPolicyReplayBuffer,
)
from pearl.utils.instantiations.environments.gym_environment import GymEnvironment
from pearl.action_representation_modules.identity_action_representation_module import (
    IdentityActionRepresentationModule,
)
from pearl.utils.functional_utils.train_and_eval.online_learning import online_learning

from time import sleep
import gym
from tqdm import tqdm
import torch
import matplotlib.pyplot as plt
import numpy as np

env = GymEnvironment("CartPole-v1", render_mode="human")
observation, action_space = env.reset()

agent = PearlAgent(
    policy_learner=DeepQLearning(
        state_dim=4,
        action_space=action_space,
        hidden_dims=[64, 64],
        training_rounds=20,
        action_representation_module=OneHotActionTensorRepresentationModule(
            max_number_actions=action_space.n
        ),
    ),
    replay_buffer=FIFOOffPolicyReplayBuffer(10_000),
)

# experiment code
number_of_steps = 10000
record_period = 1000

info = online_learning(
    agent=agent,
    env=env,
    number_of_steps=number_of_steps,
    print_every_x_steps=1000,
    record_period=record_period,
    learn_after_episode=True,
)
torch.save(info["return"], "CartPole-DQN-return.pt")
plt.plot(record_period * np.arange(len(info["return"])), info["return"], label="DQN")
plt.legend()
plt.show()

# model=???
# model.load_state_dict(torch.load("CartPole-DQN-return.pt"))
BillMatrix commented 11 months ago

As promised, the first tutorial is out for a recommender system environment. https://github.com/facebookresearch/Pearl/tree/main?tab=readme-ov-file#tutorials

More tutorials will come next year. Merry Christmas all! I'll close this issue for now and feel free to open other tasks if you have any other questions. Thanks!