Improbable-AI / walk-these-ways

Sim-to-real RL training and deployment tools for the Unitree Go1 robot.
https://gmargo11.github.io/walk-these-ways/
Other
488 stars 129 forks source link

Velocity estimation #40

Closed mertgungor closed 12 months ago

mertgungor commented 1 year ago

Hello,

I have been studying your project and reading the associated paper, where it mentioned the utilization of a state estimator network for velocity estimation. However, while examining the code, I was unable to locate the implementation of this network.

Could you please provide some guidance on where I can find the code related to the state estimator network? Additionally, if there are any specific functions or files I should look into, please let me know.

Thank you for your assistance!

Robokan commented 1 year ago

Have you looked at cheetah_state_estimator under go1_gym_deploy/utils? This is what will actually run on the go1 robot.

mertgungor commented 1 year ago

Yes, I have looked at this file, but I do not see where linear velocity is updated. Moreover, I do not see any NN for training an estimator or the use of a pre-trained model. As far as I understand from the papers I read, there needs to be a 2 layered MLP for state estimation. Am I missing something?

Qianzhong-Chen commented 1 year ago

I have the same confusion here, it seems that you use this function to get linear velocity,

def get_body_linear_vel(self): self.body_lin_vel = np.dot(self.R.T, self.world_lin_vel) return self.body_lin_vel

but how can I get the world_lin_vel?

gmargo11 commented 12 months ago

Hi @mertgungor @Qianzhong-Chen ,

The default policy I released differs from the paper here. To estimate the velocity, you'll need to train a new policy with the following flag turned on in train.py:

Cfg.env.priv_observe_body_velocity = True

Setting this flag to True during training will make the new adaptation module predict the velocity from the state history. First, the privileged observation is constructed here where you can see the priv_observe_body_velocity flag will be considered to include/exclude the body velocity. Then, the gradient step for the adaptation module is defined here. This step trains the adaptation module to predict the privileged observation using supervised learning. Finally, you can see here how the latent is predicted by the adaptation module and concatenated to the policy observation in each forward pass.

-Gabe