[Documentation] "Cartpole with Rainbow DQN" Error when executing "Test loop for inference" code

What version of AgileRL are you using? 0.1.27 What operating system and processor architecture are you using? windows10 x64

Steps to reproduce the behaviour: Create a .py file and copy/paste code from https://docs.agilerl.com/en/latest/tutorials/gymnasium/agilerl_rainbow_dqn_tutorial.html paragraphs Dependencies, Create the Environment, Instantiate an Agent, Experience Replay, Training and Saving an Agent - Using AgileRL train_off_policy function Execute the code

Create another .py file and copy paste the code from paragraphs Load agent, Test loop for inference, adding the required imports on top and INIT_HP definition. This is the code:

import gymnasium as gym
from agilerl.algorithms.dqn_rainbow import RainbowDQN
import torch
import numpy as np
import os
import imageio

INIT_HP = {
    "BATCH_SIZE": 64,  # Batch size
    "LR": 0.0001,  # Learning rate
    "GAMMA": 0.99,  # Discount factor
    "MEMORY_SIZE": 100_000,  # Max memory buffer size
    "LEARN_STEP": 1,  # Learning frequency
    "N_STEP": 3,  # Step number to calculate td error
    "PER": True,  # Use prioritized experience replay buffer
    "ALPHA": 0.6,  # Prioritized replay buffer parameter
    "BETA": 0.4,  # Importance sampling coefficient
    "TAU": 0.001,  # For soft update of target parameters
    "PRIOR_EPS": 0.000001,  # Minimum priority for sampling
    "NUM_ATOMS": 51,  # Unit number of support
    "V_MIN": -200.0,  # Minimum value of support
    "V_MAX": 200.0,  # Maximum value of support
    "NOISY": True,  # Add noise directly to the weights of the network
    # Swap image channels dimension from last to first [H, W, C] -> [C, H, W]
    "CHANNELS_LAST": False,  # Use with RGB states
    "EPISODES": 200,  # Number of episodes to train for
    "EVAL_EPS": 20,  # Number of episodes after which to evaluate the agent after
    "TARGET_SCORE": 200.0,  # Target score that will beat the environment
    "EVO_LOOP": 3,  # Number of evaluation episodes
    "MAX_STEPS": 500,  # Maximum number of steps an agent takes in an environment
}

#Load saved agent
rainbow_dqn = RainbowDQN.load("RainbowDQN_0_200.pt")

rewards = []
frames = []
testing_eps = 7
test_env = gym.make("CartPole-v1", render_mode="rgb_array")
with torch.no_grad():
    for ep in range(testing_eps):
        state = test_env.reset()[0]  # Reset environment at start of episode
        score = 0

        for step in range(INIT_HP["MAX_STEPS"]):
            # If your state is an RGB image
            if INIT_HP["CHANNELS_LAST"]:
                state = np.moveaxis(state, [-1], [-3])

            # Get next action from agent
            action, *_ = rainbow_dqn.getAction(state)

            # Save the frame for this step and append to frames list
            frame = test_env.render()
            frames.append(frame)

            # Take the action in the environment
            state, reward, terminated, truncated, _ = test_env.step(
                action
            )  # Act in environment

            # Collect the score of environment 0
            score += reward

            # Break if environment 0 is done or truncated
            if terminated or truncated:
                break

        # Collect and print episodic reward
        rewards.append(score)
        print("-" * 15, f"Episode: {ep}", "-" * 15)
        print("Episodic Reward: ", rewards[-1])

    test_env.close()

What did you expect to see? --------------- Episode: 0 --------------- Episodic Reward: 103.0 --------------- Episode: 1 --------------- Episodic Reward: 107.0 .......

What did you see instead? Describe the bug.

───────────────────── Traceback (most recent call last) ─────────────────────┐ │ C:...\miniconda3\envs\agilerl\Lib\site-packages\spyder_kernels\p │ │ y3compat.py:356 in compat_exec │ │ │ │ 353 │ │ 354 def compat_exec(code, globals, locals): │ │ 355 │ # Wrap exec in a function │ │ > 356 │ exec(code, globals, locals) │ │ 357 │ │ 358 │ │ 359 if name == 'main': │ │ │ │ d:...\python\agile_rl\cartpole\tes │ │ trender.py:66 in │ │ │ │ 63 │ │ │ │ state = np.moveaxis(state, [-1], [-3]) │ │ 64 │ │ │ │ │ 65 │ │ │ # Get next action from agent │ │ > 66 │ │ │ action, * = rainbow_dqn.getAction(state) │ │ 67 │ │ │ │ │ 68 │ │ │ # Save the frame for this step and append to frames list │ │ 69 │ │ │ frame = test_env.render() │ │ │ │ C:....\miniconda3\envs\agilerl\Lib\site-packages\agilerl\algorith │ │ ms\dqn_rainbow.py:287 in getAction │ │ │ │ 284 │ │ │ │ 285 │ │ self.actor.train(mode=training) │ │ 286 │ │ with torch.no_grad(): │ │ > 287 │ │ │ action_values = self.actor(state) │ │ 288 │ │ │ │ 289 │ │ if action_mask is None: │ │ 290 │ │ │ action = np.argmax(action_values.cpu().data.numpy(), axis │ │ │ │ C:....\miniconda3\envs\agilerl\Lib\site-packages\torch\nn\modules │ │ \module.py:1532 in _wrapped_call_impl │ │ │ │ 1529 │ │ if self._compiled_call_impl is not None: │ │ 1530 │ │ │ return self._compiled_call_impl(*args, kwargs) # type │ │ 1531 │ │ else: │ │ > 1532 │ │ │ return self._call_impl(*args, *kwargs) │ │ 1533 │ │ │ 1534 │ def _call_impl(self, args, kwargs): │ │ 1535 │ │ forward_call = (self._slow_forward if torch._C._gettracing │ │ │ │ C:....\miniconda3\envs\agilerl\Lib\site-packages\torch\nn\modules │ │ \module.py:1541 in _call_impl │ │ │ │ 1538 │ │ if not (self._backward_hooks or self._backward_pre_hooks or │ │ 1539 │ │ │ │ or _global_backward_pre_hooks or _global_backward_ho │ │ 1540 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │ │ > 1541 │ │ │ return forward_call(*args, *kwargs) │ │ 1542 │ │ │ │ 1543 │ │ try: │ │ 1544 │ │ │ result = None │ │ │ │ C:....\miniconda3\envs\agilerl\Lib\site-packages\agilerl\networks │ │ \evolvable_mlp.py:316 in forward │ │ │ │ 313 │ │ │ x = x.clamp(min=1e-3) │ │ 314 │ │ │ │ │ 315 │ │ │ if q: │ │ > 316 │ │ │ │ x = torch.sum(x self.support, dim=2) │ │ 317 │ │ │ │ 318 │ │ return x │ │ 319 │ └─────────────────────────────────────────────────────────────────────────────┘ RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Additional context miniconda, python 3.11.9, torch 2.3.0-cu118, CUDA11.8

Adding:

rainbow_dqn.device = "cuda"
rainbow_dqn.actor.cuda()

after rainbow_dqn = RainbowDQN.load("RainbowDQN_0_200.pt") fixes the problem

AgileRL / AgileRL

[Documentation] "Cartpole with Rainbow DQN" Error when executing "Test loop for inference" code #223