Improbable-AI / walk-these-ways

Sim-to-real RL training and deployment tools for the Unitree Go1 robot.
https://gmargo11.github.io/walk-these-ways/
Other
488 stars 129 forks source link

Questions about Actuator Network #67

Closed TextZip closed 4 months ago

TextZip commented 5 months ago

Hi @gmargo11 ,

The paper briefly goes over Actuator Modelling and reducing the sim2real gap via a trained actuator network. Can you please shed some more light on this as there aren't a lot of details about it in the paper, I did notice some training and eval scripts for the actuator net in the repo but I have a few questions:

  1. How much of a difference did you notice with and without the actuator model? Is the performance with an actuator network the same as the performance with a good amount of domain randomization?
  2. How was the dataset for training this model collected/prepared?
  3. How accurate can the model get and how do you deal with cases where the data being fed into the actuator is out of state distribution? (Like is there any safety or regulations in place to make it safe for use in unseen situations).

Thanks a lot once again for your time, this repo is truly a treasure trove of learning.

gmargo11 commented 5 months ago

Hi @TextZip ,

I recently inspected the actuator network in this repo and drew a few new conclusions.

The actuator network is modeling a few specific properties of the electric motor:

Regarding your questions:

  1. The actuator network helps -- less for the walk-these-ways policies since they have good heuristics in the reward function, but more so for other policies. Instead of using the actuator network, explicitly modeling the actuator properties like velocity-dependent torque limits and internal joint damping can also work. I have yet to evaluate the difference thoroughly on the real robot, but both work decently for controllers I've trained.
  2. The training dataset for the actuator network was collected by running a walking controller trained with no actuator network. I ran the robot around with different gaits and dropped it from some height a few times to incur high torques.
  3. The actuator network is potentially prone to out-of-distribution issues since there's not much training data around the torque and velocity limits or where the action changes very fast. If you train policies without regularization, like an action rate penalty, this could become significant. I'd say it's worth trying some combo of modeling the actuator properties above explicitly + doing some domain randomization to see if it works better in those cases

Gabe