google-deepmind / mujoco

Multi-Joint dynamics with Contact. A general purpose physics simulator.
https://mujoco.org
Apache License 2.0
7.81k stars 777 forks source link

How can I get a python version true dynamic model of openai gym Mujoco task for model predictive control? #1807

Closed shizhec closed 2 months ago

shizhec commented 2 months ago

Hi,

I'm a student and I am new to the optimal control field. I'm trying to use MPC to solve the gymnasium Mujoco environments for tasks like Walker2d, Ant, Hopper etc. Since MPC requires a prior knowledge on the dynamic model of the problem, I am wondering how can I acquire the dynamic model of those Mujoco tasks?

I am using the Python Mujoco package version 2.3.7 with a gymnasium version 0.29.1. So the dynamic in python version would be preferred.

for the reference of the dynamic, here is the python version of the dynamic step functions of the pendulum task in the gym:

def step(self, states, actions, params_dict=None):
        """Receives tensors of current states and actions and computes the
        states for the subsequent timestep. If sampled parameters are provided,
        these must be used, otherwise default model parameters are used.

        Must be bounded by observation and action spaces.

        :param states: A tensor containing the current states of one or multiple
            trajectories.
        :type states: torch.Tensor
        :param actions: A tensor containing the next planned actions of one or
            multiple trajectories.
        :type actions: torch.Tensor
        :param sampled_params: A tensor containing samples for the uncertain
            system parameters. Note that the number of samples must be either 1
            or the number of trajectories. If 1, a single sample is used for all
            trajectories, otherwise use one sample per trajectory.
        :type sampled_params: dict
        :returns: A tensor with the next states of one or multiple trajectories.
        :rtype: torch.Tensor
     """
    dt = self.dt
    # Assigning states and params, keeping their dims
    theta, theta_d = states.clone().chunk(2, dim=-1)
    if params_dict is not None:
        batch_params = self.params_dict.copy()
        for key in params_dict.keys():
            batch_params[key] = params_dict[key]
        g, m, length = batch_params.values()
    else:
        g, m, length = self.params_dict.values()
    acts = actions.clamp(min=-self.__max_torque, max=self.__max_torque)
    theta_d = theta_d + dt * (
        -3 * g / (2 * length) * (theta + math.pi).sin()
        + 3.0 / (m * length ** 2) * acts
    )
    theta_d = theta_d.clamp(-self.__max_speed, self.__max_speed)
    theta = theta + theta_d * dt  # Use new theta_d
    return torch.cat((theta, theta_d), dim=-1)

I also found in the Gymnasium repo, there is an issue require the same thing but no solutions yet. Farama-Foundation/Gymnasium#605

I'd appreciate it if someone could help me with this

yuvaltassa commented 2 months ago

how can I acquire the dynamic model of those Mujoco tasks?

Not clear what you are asking here. Computing the dynamics is exactly what the library does. MuJoCo is the dynamics.

yuvaltassa commented 2 months ago

Regarding this

I also found in the Gymnasium repo, there is an issue require the same thing but no solutions yet.

@Kallinteris-Andreas is basically giving the same answer I did.

shizhec commented 2 months ago

Regarding this

I also found in the Gymnasium repo, there is an issue require the same thing but no solutions yet.

@Kallinteris-Andreas is basically giving the same answer I did.

Hi yuval @yuvaltassa , Thanks for your time and the answer, and sorry that I am still a bit confuse about it. Regarding the dynamic step function I provide for the pendulum example. I found it in some MPC works like this https://github.com/locuslab/mpc.pytorch. It takes the current state and action, and output the next state, so that MPC can predict a sequence of state using an action sequence and compute the cost to optimise the action. I'm wondering if I could have the same step function that takes as input the current state and action of the Mujoco robot and return the next state (qpos, qvel)? Or if you suggest that I can just create another env and set the state and do the gym.step?

yuvaltassa commented 1 month ago

Yes you can, this is what MuJoCo does. The function is called mj_step.

Please read the Overview chapter.

shizhec commented 1 month ago

Yes you can, this is what MuJoCo does. The function is called mj_step.

Please read the Overview chapter.

Thanks for the answer!