duburcqa / jiminy

Jiminy: a fast and portable Python/C++ simulator of poly-articulated robots with OpenAI Gym interface for reinforcement learning
https://duburcqa.github.io/jiminy
MIT License
223 stars 26 forks source link

[python/gym] Unable to run simple Atlas simulation because truncated is true immediately #781

Closed mwulfman closed 5 months ago

mwulfman commented 5 months ago

Reproduction scripts:

import numpy as np
from gym_jiminy.envs.atlas import AtlasJiminyEnv
from jiminy_py import tree

env = AtlasJiminyEnv()
for i in range(100):
    obs, _ = env.reset(seed = i)
    terminated, truncated = False, False
    while not (terminated or truncated):
        assert not any(np.isnan(e).any() for e in tree.flatten(obs)), "Nan in observation"
        obs, _, terminated, truncated, _ = env.step(env.action_space.sample())
        print(terminated, truncated, not env._contains_observation(), env.observation_space.contains(obs))
        # truncated is true immediately because of env._contains_observation()
        truncated = env.stepper_state.t >= env.simulation_duration_max - env.step_dt # comment this line to make it work
        env.replay()

I am not able to run an Atlas simulation for more than step_dt because truncated is set to true by env._contains_observation() after the first step. I am not sure this is the desired behavior. Since the observation is clipped by the observation_space after being returned by step, it seems contradictory to return truncated = True when the unclipped observation is outside the observation_space. Even though, even without clipping the observation, the simulator should not return observation outside the observation_space, no ?

Difference between obs and env.observation (respectively clipped and unclipped) are just observed on ["states"]["agent"]["v"] and the v_measurement of ["measurements"]["EncoderSensor"].

duburcqa commented 5 months ago

It is not clear to me if we should set terminated = True if the observation is out of bounds. After checking, you are violating the maximum velocity limit. You are exceeding the hardware specification, so it cannot be executed on the robot. I have the felling it is a real problem and the simulation should be terminated for this.

duburcqa commented 5 months ago

Maybe the actual problem is coming from the fact that those specifications are not really working this way in really. In the real world, velocity limit does not exist. If you have some external source of torque, you could spine the joint as fast as you want. In practice, there is a relation between motor torque and angular velocity which limits the measured velocity ultimately. Without external source of energy, the joint velocity is therefore limited.

What I said previous still stand. I don't know what we want to do in this scenario. It was specified in the URDF that the joints are not supposed to spine faster than some maximum velocity, and I think we should honour that by default. Whether some command torque is capable of invalidating the current state is an another issue. Here it is the case because the actuators are very powerful. But I'm also expecting the system to break if you start standing random torques to the actuators in reality, so it is not a false positive.

duburcqa commented 5 months ago

In my view, It is the responsibility of the practitioner to send sensible commands to the robot to avoid invalidating the robot state. As much as you cannot explore randomly in reality, you should be careful when exploring in simulation. Such issue would not appear when plugging a low-level PD controller with velocity bounds for instance.

mwulfman commented 5 months ago

Would it be possible to send an additional warning message when the commanded torque is unfeasible given the velocity limits ? I feel like this behavior is not explicit enough, especially in the learning situation.

duburcqa commented 5 months ago

How to detect this is not straightforward as it depends on the dynamics of the whole system. Maybe some estimate based on the apparent inertia could be computed but it would only be rough approximation as it also involves all the external forces, mechanical constraints...etc.

I agree it would be nice to detect such issue differently than just detecting than the observation is out of bounds but I'm not sure it can be predicted reliably. I'm still convinced that controlling a robot in torque is tricky and should be avoided if possible, one of the reason being unreasonable joint velocities when sending random commands.

mwulfman commented 5 months ago

Ok, yes I understand your point. Does Atlas has default acceleration limit as well ? The same issue appears with this new code (unless I lower the action sampling range)

import numpy as np
from gym_jiminy.common.bases import DT_EPS
from gym_jiminy.common.blocks.proportional_derivative_controller import (
    pd_adapter,
    pd_controller,
)
from gym_jiminy.common.utils import is_breakpoint, sample
from gym_jiminy.envs.atlas import PD_FULL_KD, PD_FULL_KP, AtlasJiminyEnv
from jiminy_py import tree

PD_KP = np.array(PD_FULL_KP)
PD_KD = np.array(PD_FULL_KD)

env = AtlasJiminyEnv(step_dt=0.005)

nmotors = env.robot.nmotors
# Limits
position_limit_lower = env.robot.position_limit_lower[env.robot.motor_position_indices].squeeze()
position_limit_upper = env.robot.position_limit_upper[env.robot.motor_position_indices].squeeze()
velocity_limit = env.robot.velocity_limit[env.robot.motor_velocity_indices]
motors_effort_limit = env.robot.command_limit[env.robot.motor_velocity_indices]
acceleration_limit = np.full((nmotors,), float("inf"))
# Buffers
desired_command_acceleration = np.full((nmotors,), float("nan"))
command_torque = np.full((nmotors,), 0.0)
command_state = np.full((3, nmotors,), 0.0)
command_state_lower = np.stack((position_limit_lower, -velocity_limit, -acceleration_limit))
command_state_upper =  np.stack((position_limit_upper, velocity_limit, acceleration_limit))

# Target position
neutral_motor_position = env._neutral()[env.robot.motor_position_indices].squeeze()
target_dt = 0.05

for i in range(100):
    obs, _ = env.reset(seed = i)
    terminated, truncated = False, False
    while not (terminated or truncated):
        assert not any(np.isnan(e).any() for e in tree.flatten(obs)), "Nan in observation"
        q_measured, v_measured = env.observation["measurements"]["EncoderSensor"]
        if env.stepper_state.t < env.step_dt:
            command_state[0] = q_measured
            command_state[1] = v_measured
        if is_breakpoint(env.stepper_state.t, target_dt, DT_EPS):
            action = sample(low=-velocity_limit, high=velocity_limit, rg=env.np_random)
            pd_adapter(action, 1,
                    command_state,
                    command_state_lower,
                    command_state_upper,
                    target_dt,
                    desired_command_acceleration)
        command_state[2] = desired_command_acceleration
        pd_controller(q_measured,
                      v_measured,
                      command_state,
                      command_state_lower,
                      command_state_upper,
                      PD_KP,
                      PD_KD,
                      motors_effort_limit,
                      env.step_dt,
                      command_torque)
        obs, _, terminated, truncated, _ = env.step(command_torque)
    env.simulator.stop()
    env.replay()
duburcqa commented 5 months ago

You should have a look to AtlasPDControlJiminyEnv pipeline environment. Max velocity and acceleration are specified to avoid this issues. The base AtlasJiminyEnv is exposed for expert but it is not really supposed to be used as-is directly.

mwulfman commented 5 months ago

Ok thanks, I will have a look.

duburcqa commented 5 months ago

when I am respecting the position, velocity and acceleration limits

You should be more restrictive than that. Satisfying the limits for the command is not enough to make sure that the measure is not out of bounds. That is why the threshold are much lower in AtlasPDControlJiminyEnv.