eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
591 stars 153 forks source link

TypeError: must be real number, not dict while running highway env with dqn #89

Closed siddahant closed 2 years ago

siddahant commented 2 years ago

 PS C:\Users\sjain198\rl-eee598ddqn> python -u "c:\Users\sjain198\rl-eee598ddqn\rl-agents\test.py"
C:\Users\sjain198\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\gym\utils\passive_env_checker.py:31: UserWarning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector). We recommend flattening the observation to have only a 1D vector or use a custom policy to properly process the data. Actual observation shape: (5, 5)
  logger.warn(
Preferred device cuda:best unavailable, switching to default cpu
C:\Users\sjain198\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\gym\utils\passive_env_checker.py:289: UserWarning: WARN: No render fps was declared in the environment (env.metadata['render_fps'] is None or not defined), rendering may occur at inconsistent fps.
  logger.warn(
C:\Users\sjain198\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\gym\wrappers\monitoring\video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <OrderEnforcing<PassiveEnvChecker<HighwayEnvFast<highway-fast-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
  logger.warn(
state = [array([[ 1.        ,  0.756834  ,  0.        ,  0.3125    ,  0.        ],
       [ 1.        ,  0.10442047,  0.6666667 , -0.04089725,  0.        ],
       [ 1.        ,  0.23773758,  0.        , -0.01617631,  0.        ],
       [ 1.        ,  0.36480722,  0.        , -0.03598242,  0.        ],
       [ 1.        ,  0.4843581 ,  0.6666667 , -0.03250245,  0.        ]],
      dtype=float32)]
c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\agents\deep_q_network\pytorch.py:81: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ..\torch\csrc\utils\tensor_new.cpp:204.)
  state_actions = self.value_net(torch.tensor(states, dtype=torch.float).to(self.device))
state = [[-0.0101861  -0.04867542  0.02736989  0.13716325  0.09483606]]
state = [{'speed': 25, 'crashed': False, 'action': 3, 'rewards': {'collision_reward': 0.0, 'right_lane_reward': 0.0, 'high_speed_reward': 0.5, 'on_road_reward': 1.0}}]
Traceback (most recent call last):
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\test.py", line 48, in <module>
    evaluation.train()
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\trainer\evaluation.py", line 116, in train
    self.run_episodes()
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\trainer\evaluation.py", line 144, in run_episodes
    reward, terminal = self.step()
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\trainer\evaluation.py", line 164, in step
    actions = self.agent.plan(self.observation)
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\agents\common\abstract.py", line 47, in plan
    return [self.act(state)]
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\agents\deep_q_network\abstract.py", line 79, in act
    return tuple(self.act(agent_state, step_exploration_time=False) for agent_state in state)
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\agents\deep_q_network\abstract.py", line 79, in <genexpr>
    return tuple(self.act(agent_state, step_exploration_time=False) for agent_state in state)
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\agents\deep_q_network\abstract.py", line 82, in act
    values = self.get_state_action_values(state)
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\agents\deep_q_network\abstract.py", line 143, in get_state_action_values
    return self.get_batch_state_action_values([state])[0]
  File "c:\Users\sjain198\rl-eee598ddqn\rl-agents\rl_agents\agents\deep_q_network\pytorch.py", line 81, in get_batch_state_action_values
    state_actions = self.value_net(torch.tensor(states, dtype=torch.float).to(self.device))
TypeError: must be real number, not dict

image

eleurent commented 2 years ago

I pushed a bugfix, let me know if that solves the issue.

siddahant commented 2 years ago

No, still getting the same error here is my sample colab code.

`

Environment

!pip install highway-env
import gym
import highway_env

Agent

!pip install git+https://github.com/eleurent/rl-agents#egg=rl-agents
from rl_agents.agents.common.factory import agent_factory

Visualisation

import sys
from tqdm.notebook import trange
!pip install gym pyvirtualdisplay
!apt-get install -y xvfb python-opengl ffmpeg
!git clone https://github.com/eleurent/highway-env.git
sys.path.insert(0, './highway-env/scripts/')
from utils import record_videos, show_videos

Make environment

env = gym.make("highway-fast-v0")
env = record_videos(env)
obs, done = env.reset(), False

Make agent

agent_config = {
    "__class__": "<class 'rl_agents.agents.deep_q_network.pytorch.DQNAgent'>",
    "model": {
        "type": "MultiLayerPerceptron",
        "layers": [256, 256]
    },
    "double": False,
    "loss_function": "l2",
    "optimizer": {
        "lr": 5e-4
    },
    "gamma": 0.8,
    "n_steps": 1,
    "batch_size": 32,
    "memory_capacity": 15000,
    "target_update": 50,
    "exploration": {
        "method": "EpsilonGreedy",
        "tau": 6000,
        "temperature": 1.0,
        "final_temperature": 0.05
    }
}
agent = agent_factory(env, agent_config)

Run episode

for step in trange(env.unwrapped.config["duration"], desc="Running..."):
    action = agent.act(obs)
    obs, reward, done, info = env.step(action)

env.close()
show_videos()

`


TypeError                                 Traceback (most recent call last)
[<ipython-input-5-521d5e2a82ab>](https://localhost:8080/#) in <module>
     53 # Run episode
     54 for step in trange(env.unwrapped.config["duration"], desc="Running..."):
---> 55     action = agent.act(obs)
     56     obs, reward, done, info = env.step(action)
     57
[/usr/local/lib/python3.7/dist-packages/rl_agents/agents/deep_q_network/abstract.py](https://localhost:8080/#) in act(self, state, step_exploration_time)
     76         # TODO: it would be more efficient to forward a batch of states
     77         if isinstance(state, tuple):
---> 78             return tuple(self.act(agent_state, step_exploration_time=False) for agent_state in state)
     79 
     80         # Single-agent setting

[/usr/local/lib/python3.7/dist-packages/rl_agents/agents/deep_q_network/abstract.py](https://localhost:8080/#) in <genexpr>(.0)
     76         # TODO: it would be more efficient to forward a batch of states
     77         if isinstance(state, tuple):
---> 78             return tuple(self.act(agent_state, step_exploration_time=False) for agent_state in state)
     79 
     80         # Single-agent setting

[/usr/local/lib/python3.7/dist-packages/rl_agents/agents/deep_q_network/abstract.py](https://localhost:8080/#) in act(self, state, step_exploration_time)
     79 
     80         # Single-agent setting
---> 81         values = self.get_state_action_values(state)
     82         self.exploration_policy.update(values)
     83         return self.exploration_policy.sample()

[/usr/local/lib/python3.7/dist-packages/rl_agents/agents/deep_q_network/abstract.py](https://localhost:8080/#) in get_state_action_values(self, state)
    138         :return: [Q(a1,s), ..., Q(an,s)] the array of its action-values for each actions
    139         """
--> 140         return self.get_batch_state_action_values([state])[0]
    141 
    142     def step_optimizer(self, loss):

[/usr/local/lib/python3.7/dist-packages/rl_agents/agents/deep_q_network/pytorch.py](https://localhost:8080/#) in get_batch_state_action_values(self, states)
     78 
     79     def get_batch_state_action_values(self, states):
---> 80         return self.value_net(torch.tensor(states, dtype=torch.float).to(self.device)).data.cpu().numpy()
     81 
     82     def save(self, filename):

TypeError: must be real number, not dict
siddahant commented 2 years ago

I am getting this as states

tensor([[ 1.0000,  0.9185,  0.0000,  0.3125,  0.0000,  1.0000,  0.1066,  0.0000,
         -0.0432,  0.0000,  1.0000,  0.2050,  0.2500, -0.0254,  0.0000,  1.0000,
          0.2994,  0.7500, -0.0447,  0.0000,  1.0000,  0.4037,  0.7500, -0.0229,
          0.0000]])
[{'speed': 25, 'crashed': False, 'action': 4, 'rewards': {'collision_reward': 0.0, 'right_lane_reward': 0.0, 'high_speed_reward': 0.5, 'on_road_reward': 1.0}}]

image

eleurent commented 2 years ago

Ah I see, sorry I forgot to update the colab.

You have to update two lines as follows:

(obs, info), done = env.reset(), False

and

    obs, reward, done, truncated, info = env.step(action)
siddahant commented 2 years ago

Thank you! now it works on colab.