DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.26k stars 1.71k forks source link

[Question] Atari breakout platform glitching #1020

Closed Shivam310 closed 2 years ago

Shivam310 commented 2 years ago

Question

The Atari breakout platform doesn't move and just sticks to right side.

Additional context

I was working on my atari breakout model and the project is finally completed but doesn't work nicely as shown in the tutorial. Here are two pics of how its like: 2022-07-30 2022-07-30 (1)

The link to the tutorial I was following: https://www.youtube.com/watch?v=Mut_u40Sqz4&t=7695s&ab_channel=NicholasRenotte I am using the pretrained model from this tutorial is self as my machine was taking too much time to train the model.

Code

#Import Dependencies
import gym
from stable_baselines3 import A2C
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.env_util import make_vec_env

import os
from gym.utils import play
from stable_baselines3.ddpg.policies import CnnPolicy

from ale_py import ALEInterface
from ale_py.roms import Breakout

ale = ALEInterface()

ale.loadROM(Breakout)
env = make_atari_env('BreakoutNoFrameskip-v4', seed=0)
log_path = os.path.join('Training', 'Logs')
model = A2C('CnnPolicy', env, verbose=1, tensorboard_log=log_path)
a2c_path = os.path.join('Training','Logs', 'A2C_2M_model')
model.save(a2c_path)
env.observation_space
del model
model = A2C.load(a2c_path, env)
evaluate_policy(model, env, n_eval_episodes=100, render=True)

Checklist

qgallouedec commented 2 years ago

Actually you save an untrained model, then you load it, and you evaluate it. Is that what you intend to do?

If not, I suggest you read the documentation on how to load a pre-trained model.

Shivam310 commented 2 years ago

Actually you save an untrained model, then you load it, and you evaluate it. Is that what you intend to do?

If not, I suggest you read the documentation on how to load a pre-trained model.

I don't intend to load an untrained model. I wasn't able to find the how to load a pre trained model docs. Can you please link it? Thanks a lot for the response.

qgallouedec commented 2 years ago

Here: https://stable-baselines3.readthedocs.io/en/master/guide/examples.html?basic-usage-training-saving-loading

A code example using LunarLander and DQN:

import gym

from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy

# Create environment
env = gym.make('LunarLander-v2')

# Load the trained agent
model = DQN.load("your_trained_agent_path", env=env)

# Evaluate the agent
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
Shivam310 commented 2 years ago

Hi, I tried it but I can't get it to work. It still sticks to the right side occasionally moving to the left. Code:

import gym
from stable_baselines3 import A2C
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.env_util import make_vec_env

import os
from gym.utils import play
from stable_baselines3.ddpg.policies import CnnPolicy

from ale_py import ALEInterface
from ale_py.roms import Breakout

ale = ALEInterface()

ale.loadROM(Breakout)
env = make_atari_env('BreakoutNoFrameskip-v4', seed=0)
log_path = os.path.join('Training', 'Logs')
model = A2C.load(f'C:/Users/shiva/Documents/Atari_Breakout_RL_Project/Training/Logs/A2C_2M_model', env=env)
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10, render=True)

Note: I am using the same zip file ie this: A2C_2M_model.zip I got it from this tutorial: https://www.youtube.com/watch?v=Mut_u40Sqz4&t=7695s&ab_channel=NicholasRenotte Also thanks for actively helping me, a noob.

Shivam310 commented 2 years ago

Here: https://stable-baselines3.readthedocs.io/en/master/guide/examples.html?basic-usage-training-saving-loading

A code example using LunarLander and DQN:

import gym

from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy

# Create environment
env = gym.make('LunarLander-v2')

# Load the trained agent
model = DQN.load("your_trained_agent_path", env=env)

# Evaluate the agent
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)

Please help

qgallouedec commented 2 years ago

I can't know if this agent you sent was properly trained. However, there is a repo with trained baseline agents: rl-trained-agents. You will be able to find a trained A2C agent for Breakout for example.

from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import VecFrameStack

# Create environment
env = VecFrameStack(make_atari_env("BreakoutNoFrameskip-v4"), n_stack=4)

# Load the trained agent
model = A2C.load("BreakoutNoFrameskip-v4", env=env)

# Evaluate the agent
mean_reward, std_reward = evaluate_policy(model, env, render=True)