DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.74k stars 1.66k forks source link

Kuka robot turns into a pile of things when reset from VecNormalize is called (CnnPolicy) #510

Closed ArgyChris closed 3 years ago

ArgyChris commented 3 years ago

Hello everyone,

I have just started learning the baselines. I am encountering the following problem, and I would appreciate if you could help me. I am trying to learn the reach of an object using the Kuka iiwa robot on pybullet using image based observations. First, I took the KukaGymEnv example from pybullet, I modified some steps, I choose PPO algorithm for Reinforcement Learning, and then I changed the policy network to a custom CNNPolicy. I perform a basic environment vectorization with a DummyVecEnv followed by VecNormalize. To monitor what is happening I get the observation (image), which is correct in the first reset. However, the problem that I have is that when VecNormalize is called there is a reset of the environment, and the robot turns into a pile of things (check output Figures 2, 3). Each subsequent reset creates the same error. I was not able to identify the source of error, and I would appreciate any input. I have to add that the code works fines, without any problem, if I use an older system without CUDA/GPU support (system 2).

Script for running the robot (stable baselines)

from pybullet_envs.bullet.custom_kukaGymEnv_reach_debug_CNN import KukaGymEnv
from Kuka_CNN import Kuka_CNN
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines3.common.env_checker import check_env

import os
import gym
import torch

device_cpu_or_gpu = ("cuda" if torch.cuda.is_available() else "cpu")
print('----------device------------')
print(torch.cuda.is_available())

env = KukaGymEnv(renders=False, isDiscrete=False, maxSteps=20)
check_env(env)
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=True, norm_reward=True,clip_obs=10.)

custom_policy_kwargs = dict(features_extractor_class=Kuka_CNN, features_extractor_kwargs=dict(features_dim=256))

print(custom_policy_kwargs)

model = PPO('CnnPolicy', env,  policy_kwargs=custom_policy_kwargs, verbose=1, tensorboard_log="./test_reach_tensorboard/cnnpolicy_test_ppo_kuka_reach_500k")
#print(model.)
print('------------')                
model.learn(total_timesteps=1e6)

Observation and action space

RENDER_HEIGHT=64
RENDER_WIDTH=64
largeValObservation = 100
observationDim = len(self.getExtendedObservation())
observation_high = np.array([largeValObservation] * observationDim)
if (self._isDiscrete):
  self.action_space = spaces.Discrete(7)
else:
  action_dim = 3
  self._action_bound = 1
  action_high = np.array([self._action_bound] * action_dim)
  self.action_space = spaces.Box(-action_high, action_high)

self.observation_space = spaces.Box(low=0, high=255, shape=(RENDER_HEIGHT, RENDER_WIDTH, 3), dtype=np.uint8)

custom_kukaGymEnv_reach_debug_CNN (reset, step, getExtendedObservation)

def reset(self):
    self.terminated = 0
    p.resetSimulation()
    p.setPhysicsEngineParameter(numSolverIterations=150)
    p.setTimeStep(self._timeStep)
    p.loadURDF(os.path.join(self._urdfRoot, "plane.urdf"), [0, 0, 0])

    xpos = 0.55 + 0.12 * random.random()
    ypos = 0 + 0.2 * random.random()
    ang = 3.14 * 0.5 + 3.1415925438 * random.random()

    orn = p.getQuaternionFromEuler([0, 0, ang])
    self.blockUid = p.loadURDF(os.path.join(self._urdfRoot, "block.urdf"), basePosition=[xpos, ypos, 0],
                               baseOrientation = [orn[0], orn[1], orn[2], orn[3]], globalScaling=2)

    p.setGravity(0, 0, -10)
    self._kuka = custom_kuka_debug_2.Kuka(urdfRootPath=self._urdfRoot, timeStep=self._timeStep)
    self._envStepCounter = 0
    p.stepSimulation()

    self.rgb_array, _ = self.getExtendedObservation()
    self._observation = self.rgb_array

    return self.rgb_array

def getExtendedObservation(self):

    self._observation = self._kuka.getObservation()
    endEffectorState = p.getLinkState(self._kuka.kukaUid, self._kuka.kukaEndEffectorIndex)
    effectorPos = endEffectorState[0]
    effectorOrn = endEffectorState[1]
    blockPos, blockOrn = p.getBasePositionAndOrientation(self.blockUid)

    invEndEffectorPos, invEndEffectorOrn = p.invertTransform(effectorPos, effectorOrn)
    effectorMat = p.getMatrixFromQuaternion(effectorOrn)
    dir0 = [effectorMat[0], effectorMat[3], effectorMat[6]]
    dir1 = [effectorMat[1], effectorMat[4], effectorMat[7]]
    dir2 = [effectorMat[2], effectorMat[5], effectorMat[8]]

    effectorEul = p.getEulerFromQuaternion(effectorOrn)
    blockPosInEffector, blockOrnInEffector = p.multiplyTransforms(invEndEffectorPos, invEndEffectorOrn,
                                                                blockPos, blockOrn)
    projectedBlockPos2D = [blockPosInEffector[0], blockPosInEffector[1]]
    blockEulerInEffector = p.getEulerFromQuaternion(blockOrnInEffector)

    #we return the relative x,y position and euler angle of block in gripper space
    blockInEffectorPosXYEulZ = [blockPosInEffector[0], blockPosInEffector[1], blockEulerInEffector[2]]
    self._observation.extend(list(blockInEffectorPosXYEulZ))

    base_pos, orn = self._p.getBasePositionAndOrientation(self._kuka.kukaUid)
    view_matrix = self._p.computeViewMatrixFromYawPitchRoll(cameraTargetPosition=base_pos,
                                                            distance=self._cam_dist,
                                                            yaw=self._cam_yaw,
                                                            pitch=self._cam_pitch,
                                                            roll=0,
                                                            upAxisIndex=2)

    proj_matrix = self._p.computeProjectionMatrixFOV(fov=90,
                                                     aspect=float(RENDER_WIDTH) / RENDER_HEIGHT,
                                                     nearVal=0.1,
                                                     farVal=100.0)

    #Get images from camera
    (_, _, px, depthImg, segImg) = self._p.getCameraImage(width=RENDER_WIDTH,
                                              height=RENDER_HEIGHT,
                                              viewMatrix=view_matrix,
                                              projectionMatrix=proj_matrix,
                                              renderer=self._p.ER_BULLET_HARDWARE_OPENGL)
    rgb_array = np.array(px, dtype=np.uint8)
    rgb_array = np.reshape(rgb_array, (RENDER_HEIGHT, RENDER_WIDTH, 4))

    rgb_array = rgb_array[:, :, :3]

    return rgb_array, self._observation

def step(self, action):

    #print("action[0]=", str(action[0]))
    dv = 0.005
    dx = action[0] * dv
    dy = action[1] * dv
    da = action[2] * 0.05
    f = 0.3
    realAction = [dx, dy, -0.002, da, f]
    return self.step2(realAction)

def step2(self, action):
    for i in range(self._actionRepeat):
      self._kuka.applyAction(action)
      p.stepSimulation()
      if self._termination():
        break
      self._envStepCounter += 1
    if self._renders:
      time.sleep(self._timeStep)
    self.rgb_array, self._observation = self.getExtendedObservation()

    done = self._termination()
    npaction = np.array([
        action[3]
    ])  #only penalize rotation until learning works well [action[0],action[1],action[3]])
    actionCost = np.linalg.norm(npaction) * 10.
    reward = self._reward() - actionCost

    return self.rgb_array, reward, done, {}

Kuka_CNN

import numpy as np
import pandas as pd
import torch as th
from torch import nn, tensor
from collections import deque
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor

def conv3x3(in_channels, out_channels, stride=1):
    return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=True)

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        self.relu = nn.ReLU()
        self.conv1 = conv3x3(in_channels, out_channels, stride)
        self.conv2 = conv3x3(out_channels, out_channels, stride)

    def forward(self, x):
        residual = x
        out = self.relu(x)
        out = self.conv1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out += residual
        return out

class Kuka_CNN(BaseFeaturesExtractor):

    def __init__(self, observation_space, features_dim=256):
        super().__init__(observation_space, features_dim)
        in_channels = observation_space.shape[0]  # channels x height x width       
        self.cnn = nn.Sequential(
            conv3x3(in_channels=in_channels, out_channels=32),
            nn.MaxPool2d(kernel_size=3, stride=2, dilation=1, ceil_mode=False),
            ResidualBlock(in_channels=32, out_channels=32),
            ResidualBlock(in_channels=32, out_channels=32),
            nn.ReLU(),
            nn.Flatten(),
        )
        self.linear = nn.Sequential(   
          nn.Linear(in_features=30752, out_features=features_dim, bias=True), #for image height = 64 and image width =  64        
          nn.ReLU(),
        )

    def forward(self, obs):
        return self.linear(self.cnn(obs))

custom_kuka_debug_2

def reset(self):

    print('------------real reset------------------------')
    objects = p.loadSDF(os.path.join(self.urdfRootPath, "kuka_iiwa/model.sdf"))
    self.kukaUid = objects[0]

    base_pos, orn = p.getBasePositionAndOrientation(self.kukaUid)

    view_matrix = p.computeViewMatrixFromYawPitchRoll(cameraTargetPosition = base_pos,
                                                            distance = 1.5,
                                                            yaw = 90,
                                                            pitch = -30,
                                                            roll = 0,
                                                            upAxisIndex =2)

    proj_matrix = p.computeProjectionMatrixFOV(fov=90,
                                               aspect=float(640)/640,
                                               nearVal=0.1,
                                               farVal=100.0)

    (_, _, Ipx, depthImg, segImg) = p.getCameraImage(width=640,
                                                     height=640,
                                                     viewMatrix=view_matrix,
                                                     projectionMatrix = proj_matrix,
                                                     renderer=p.ER_BULLET_HARDWARE_OPENGL)

    imgplot = plt.imshow(Ipx)
    plt.show()
    plt.pause(5)

    p.resetBasePositionAndOrientation(self.kukaUid, [-0.100000, 0.000000, 0.070000],
                                      [0.000000, 0.000000, 0.000000, 1.000000])

    self.jointPositions = [0.006418, 0.413184, -0.011401, -1.589317, 0.005379, 1.137684, -0.006539]

    self.numJoints = p.getNumJoints(self.kukaUid)
    for jointIndex in range(self.numJoints):
      p.resetJointState(self.kukaUid, jointIndex, self.jointPositions[jointIndex])
      p.setJointMotorControl2(self.kukaUid,
                              jointIndex,
                              p.POSITION_CONTROL,
                              targetPosition=self.jointPositions[jointIndex],
                              force=self.maxForce)

    self.endEffectorPos = [0.537, 0.0, 0.5]
    self.endEffectorAngle = 0

    self.motorNames = []
    self.motorIndices = []

    for i in range(self.numJoints):
      jointInfo = p.getJointInfo(self.kukaUid, i)
      #print('-----------jointInfo-----------')
      #print(jointInfo)

      qIndex = jointInfo[3]
      if qIndex > -1:
        self.motorNames.append(str(jointInfo[1]))
        self.motorIndices.append(i)

Output

pybullet build time: Jul 12 2021 16:50:13
current_dir=/home/argy/anaconda3/envs/basic_env/lib/python3.9/site-packages/pybullet_envs/bullet
current_dir=/home/argy/anaconda3/envs/basic_env/lib/python3.9/site-packages/pybullet_envs/bullet
-------device------
True
------------real reset------------------------

Figure_1

/home/argy/anaconda3/envs/basic_env/lib/python3.9/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
------------real reset------------------------

Figure_2

{'features_extractor_class': <class 'Kuka_CNN.Kuka_CNN'>, 'features_extractor_kwargs': {'features_dim': 256}}
Using cuda device
Wrapping the env in a VecTransposeImage.
------------
------------real reset------------------------

Figure_3

 System Info

Describe the characteristic of your environment: System 1 (with error):

Additional context

System 2 where I do not encounter the problem:

araffin commented 3 years ago

Hello,

perform a basic environment vectorization with a DummyVecEnv followed by VecNormalize.

Why do you use VecNormalize with image? You should deactivate at least the obs normalization at it is done automatically for images (cf. doc) using norm_obs=False.

ArgyChris commented 3 years ago

Hello,

Thank you for the reply. I will keep in mind to deactivate the obs normalization.

Upon further investigation, it seems that there is a problem if I use the latest pytorch that supports CUDA=11.3. I found that if I use the same configuration as the older system I overcome the problem, and the loading of the robot model in the reset works fine. So, I created a new conda environment with the following that works.

3.8.5 Python
PyTorch version: 1.7.1+cu110
Gym version: 0.18.3
stable_baselines3 version: 1.0

With the previous configuration there is support for the latest GPU.

Thank you, Argy

araffin commented 3 years ago

it seems that there is a problem if I use the latest pytorch that supports CUDA=11.3.

So, I created a new conda environment with the following that works.

then it does not seem related to SB3...

araffin commented 3 years ago

then it does not seem related to SB3...

Closing because probably not related to SB3