facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
MIT License
2.43k stars 142 forks source link

Is there some limitation with the dimensions of actions and observations? #27

Open paapu88 opened 6 months ago

paapu88 commented 6 months ago

Dear Developers, I'm getting the following error when running the code below

pearl/neural_networks/common/value_networks.py", line 262, in get_q_values x = torch.cat([state_batch, action_batch], dim=-1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Tensors must have same number of dimensions: got 4 and 2

Am I doing something stupid, or is there some limitation (for instance, so that dimension of the action and observation space must be the same?) Terveisin, Markus

""" 
copy pasted from 
https://github.com/facebookresearch/Pearl?tab=readme-ov-file#quick-start

with small modifications for training,

"""

from pearl.pearl_agent import PearlAgent
from pearl.action_representation_modules.one_hot_action_representation_module import (
    OneHotActionTensorRepresentationModule,
)
from pearl.policy_learners.sequential_decision_making.deep_q_learning import (
    DeepQLearning,
)
from pearl.replay_buffers.sequential_decision_making.fifo_off_policy_replay_buffer import (
    FIFOOffPolicyReplayBuffer,
)
from pearl.utils.instantiations.environments.gym_environment import GymEnvironment
from pearl.action_representation_modules.identity_action_representation_module import (
    IdentityActionRepresentationModule,
)
from pearl.utils.functional_utils.train_and_eval.online_learning import online_learning

from time import sleep
import gym
from tqdm import tqdm
import torch
import matplotlib.pyplot as plt
import numpy as np

# env = GymEnvironment("highway-v0", render_mode="human")

# env = GymEnvironment("CartPole-v1", render_mode="human")
env = GymEnvironment("CarRacing-v2", render_mode="human", continuous=False)
observation, action_space = env.reset()
print(f"observation")
print(observation)
print(f"action_space")
attributes = dir(action_space)
print(attributes)
print(f"action dim: {action_space.action_dim}")
# print(f"actions: {action_space.actions}")

# sys.exit()

agent = PearlAgent(
    policy_learner=DeepQLearning(
        state_dim=9216,
        action_space=action_space,
        hidden_dims=[64, 64],
        training_rounds=20,
        action_representation_module=OneHotActionTensorRepresentationModule(
            max_number_actions=5
        ),
    ),
    replay_buffer=FIFOOffPolicyReplayBuffer(10_000),
)

# experiment code
number_of_steps = 10000
record_period = 1000

info = online_learning(
    agent=agent,
    env=env,
    number_of_steps=number_of_steps,
    print_every_x_steps=1000,
    record_period=record_period,
    learn_after_episode=True,
)
torch.save(info["return"], "CarRacing-DQN-return.pt")
plt.plot(record_period * np.arange(len(info["return"])), info["return"], label="DQN")
plt.legend()
plt.show()
rodrigodesalvobraz commented 6 months ago

I'm looking into this and will get back to you.

BillMatrix commented 5 months ago

@paapu88 is your observation space an image or a video with your environment?

paapu88 commented 5 months ago

@BillMatrix see https://www.gymlibrary.dev/environments/box2d/car_racing/#

jb3618columbia commented 5 months ago

I think the error is because you are using a VanillaQValueNetwork which requires the state and the action to have the same dimension. For image inputs, you want to use the CNNQValueNetwork as the network type (we need to enable that for deep q learning).

rodrigodesalvobraz commented 5 months ago

We are going to implement a fix.