HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.26k stars 239 forks source link

Torch Cuda Error #668

Closed mertalbaba closed 1 year ago

mertalbaba commented 1 year ago

Bug description

While running the following code, this error occurs:

Traceback (most recent call last):
  File "main.py", line 73, in <module>
    gail_trainer.train(20000)
  File "/local/home/.virtualenvs/gailenv/lib/python3.8/site-packages/imitation/algorithms/adversarial/common.py", line 452, in train
    self.train_disc()
  File "/local/home/.virtualenvs/gailenv/lib/python3.8/site-packages/imitation/algorithms/adversarial/common.py", line 346, in train_disc
    for batch in batch_iter:
  File "/local/home/.virtualenvs/gailenv/lib/python3.8/site-packages/imitation/algorithms/adversarial/common.py", line 598, in _make_disc_train_batches
    log_policy_act_prob = self._get_log_policy_act_prob(obs_th, acts_th)
  File "/local/home/.virtualenvs/gailenv/lib/python3.8/site-packages/imitation/algorithms/adversarial/common.py", line 504, in _get_log_policy_act_prob
    scaled_acts_th = self.policy.scale_action(acts_th)
  File "/local/home/.virtualenvs/gailenv/lib/python3.8/site-packages/stable_baselines3/common/policies.py", line 371, in scale_action
    return 2.0 * ((action - low) / (high - low)) - 1.0
  File "/local/home/.virtualenvs/gailenv/lib/python3.8/site-packages/torch/_tensor.py", line 956, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Steps to reproduce

I used the GAIL training code shared in your documentation in Humanoid-v3 environment. Code is written below:

from stable_baselines3 import SAC
import numpy as np
import torch
import gym
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.ppo import MlpPolicy

from imitation.algorithms.adversarial.gail import GAIL
from imitation.data import rollout
from imitation.data.wrappers import RolloutInfoWrapper
from imitation.rewards.reward_nets import BasicRewardNet
from imitation.util.networks import RunningNorm
from imitation.util.util import make_vec_env

device = torch.device('cuda:0')
torch.cuda.empty_cache()

print("Loading expert demonstrations...")
rng = np.random.default_rng(0)

env = gym.make("Humanoid-v3")
expertAgent = SAC.load("path/to/file/humanoid.zip", print_system_info=False)

print("Rollouts...")
rollouts = rollout.rollout(
    expertAgent,
    make_vec_env(
        "Humanoid-v3",
        n_envs=5,
        post_wrappers=[lambda env, _: RolloutInfoWrapper(env)],
        rng=rng,
    ),
    rollout.make_sample_until(min_timesteps=None, min_episodes=60),
    rng=rng,
)

print("Training...")
venv = make_vec_env("Humanoid-v3", n_envs=8, rng=rng)
learner = SAC("MlpPolicy", venv, verbose=1)
reward_net = BasicRewardNet(
    venv.observation_space,
    venv.action_space,
    normalize_input_layer=RunningNorm,
)

gail_trainer = GAIL(
    demonstrations=rollouts,
    demo_batch_size=1024,
    gen_replay_buffer_capacity=2048,
    n_disc_updates_per_round=4,
    venv=venv,
    gen_algo=learner,
    reward_net=reward_net,
)

gail_trainer.train(20000)
rewards, _ = evaluate_policy(learner, venv, 100, return_episode_rewards=True)
print("Rewards:", rewards)

Environment

mertalbaba commented 1 year ago

FYI problem is in def _get_log_policy_act_prob function (Line 504 in imitation/algorithms/adversarial/common.py). This function takes torch tensor inputs, and forwards them to stable_baselines3 scale_action function, which expects numpy arrays, not tensors.

ernestum commented 1 year ago

Thanks for the bug report. This is probably a duplicate of #655 we already have a fix for this in #660. Try checking out that PR as long as we did not merge it yet.

ernestum commented 1 year ago

Fixed by #660