Bug with UR5e Lift OSC Pose Experiment

I want to try SAC with UR5e Lift OSC Pose environment, so I modify the variant of Panda like this

{
  "algorithm": "SAC",
  "algorithm_kwargs": {
    "batch_size": 128,
    "eval_max_path_length": 500,
    "expl_max_path_length": 500,
    "min_num_steps_before_training": 3300,
    "num_epochs": 2000,
    "num_eval_steps_per_epoch": 2500,
    "num_expl_steps_per_train_loop": 2500,
    "num_trains_per_train_loop": 1000
  },
  "eval_environment_kwargs": {
    "control_freq": 20,
    "controller": "OSC_POSE",
    "env_name": "Lift",
    "hard_reset": false,
    "horizon": 500,
    "ignore_done": true,
    "reward_scale": 1.0,
    "robots": [
      "UR5e"
    ]
  },
  "expl_environment_kwargs": {
    "control_freq": 20,
    "controller": "OSC_POSE",
    "env_name": "Lift",
    "hard_reset": false,
    "horizon": 500,
    "ignore_done": true,
    "reward_scale": 1.0,
    "robots": [
      "UR5e"
    ]
  },
  "policy_kwargs": {
    "hidden_sizes": [
      256,
      256
    ]
  },
  "qf_kwargs": {
    "hidden_sizes": [
      256,
      256
    ]
  },
  "replay_buffer_size": 1000000,
  "seed": 59,
  "trainer_kwargs": {
    "discount": 0.99,
    "policy_lr": 0.001,
    "qf_lr": 0.0005,
    "reward_scale": 1.0,
    "soft_target_tau": 0.005,
    "target_update_period": 5,
    "use_automatic_entropy_tuning": true
  },
  "version": "normal"
}

The experiment starts normally, but a mujoco error pops up after running to 68 epochs.

Traceback (most recent call last):
  File "/home/iiismig-pub1/robosuite-benchmark/scripts/train.py", line 131, in <module>
    run_experiment()
  File "/home/iiismig-pub1/robosuite-benchmark/scripts/train.py", line 104, in run_experiment
    experiment(variant, agent=args.agent)
  File "/home/iiismig-pub1/robosuite-benchmark/util/rlkit_utils.py", line 163, in experiment
    algorithm.train()
  File "/home/iiismig-pub1/robosuite-benchmark/util/rlkit_custom.py", line 46, in train
    self._train()
  File "/home/iiismig-pub1/robosuite-benchmark/util/rlkit_custom.py", line 216, in _train
    discard_incomplete_paths=True,
  File "/home/iiismig-pub1/rlkit/rlkit/samplers/data_collector/path_collector.py", line 45, in collect_new_paths
    max_path_length=max_path_length_this_loop,
  File "/home/iiismig-pub1/rlkit/rlkit/samplers/rollout_functions.py", line 113, in rollout
    next_o, r, d, env_info = env.step(a)
  File "/home/iiismig-pub1/rlkit/rlkit/envs/wrappers.py", line 161, in step
    wrapped_step = self._wrapped_env.step(scaled_action)
  File "/home/iiismig-pub1/anaconda3/envs/rb_bench/lib/python3.7/site-packages/robosuite/wrappers/gym_wrapper.py", line 102, in step
    ob_dict, reward, done, info = self.env.step(action)
  File "/home/iiismig-pub1/anaconda3/envs/rb_bench/lib/python3.7/site-packages/robosuite/environments/base.py", line 281, in step
    self.sim.step()
  File "mujoco_py/mjsim.pyx", line 126, in mujoco_py.cymj.MjSim.step
  File "mujoco_py/cymj.pyx", line 156, in mujoco_py.cymj.wrap_mujoco_warning.__exit__
  File "mujoco_py/cymj.pyx", line 77, in mujoco_py.cymj.c_warning_callback
  File "/home/iiismig-pub1/anaconda3/envs/rb_bench/lib/python3.7/site-packages/mujoco_py/builder.py", line 364, in user_warning_raise_exception
    raise MujocoException('Got MuJoCo Warning: {}'.format(warn))
mujoco_py.builder.MujocoException: Got MuJoCo Warning: Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 10.4640.

I try to use different seeds, but the error stills there after running to exactly 68 epochs. Any idea?

Hi @YeeCY ,

Apologies for the delay -- somehow missed this notification. Unlike all the other robot arms in robosuite, the UR5e robot only has 6DOF -- which means that there's no extra redundancy when trying to control the full 6DOF robot pose (x, y, z, ax, ay, az). So my initial thought is that the agent is easily reaching a pose singularity, resulting in unstable sim behavior (i.e.: the NaNs you're getting).

What is strange though is that this error is occurring after exactly 68 epochs every time -- especially if you randomized the seed.

Can you try the following? UR5e wasn't tuned that much, so it could very well possibly be a modeling problem.

In controllers/osc.py, change L176-L177 to the following:

self.position_limits = np.array(position_limits) if position_limits is not None else position_limits
self.orientation_limits = np.array(orientation_limits) if orientation_limits is not None else orientation_limits

This fixes a small bug that I've fixed on a separate branch but haven't merged into master yet.

Then, change the position_limits in controllers/config/osc_pose.py to [[-0.25, -0.25, 0.8], [0.25, 0.25, 1.1]]. This will prevent the agent from being able to command the robot arm outside of its workspace and causing a singularity.

Lastly, go to models/assets/robots/ur5e/robot.xml, and change all damping values to 0.001, and also add frictionloss=0.01 for all of these joints. This will improve the actuator stability of the model.

After making these changes, can you try training the agent again? I tested the above configuration and it seems to control a bit better, at least from a teleoperation standpoint.

ARISE-Initiative / robosuite-benchmark

Bug with UR5e Lift OSC Pose Experiment #4