Questions about Actuator Network

Hi @gmargo11 ,

The paper briefly goes over Actuator Modelling and reducing the sim2real gap via a trained actuator network. Can you please shed some more light on this as there aren't a lot of details about it in the paper, I did notice some training and eval scripts for the actuator net in the repo but I have a few questions:

How much of a difference did you notice with and without the actuator model? Is the performance with an actuator network the same as the performance with a good amount of domain randomization?
How was the dataset for training this model collected/prepared?
How accurate can the model get and how do you deal with cases where the data being fed into the actuator is out of state distribution? (Like is there any safety or regulations in place to make it safe for use in unseen situations).

Thanks a lot once again for your time, this repo is truly a treasure trove of learning.

Hi @TextZip ,

I recently inspected the actuator network in this repo and drew a few new conclusions.

The actuator network is modeling a few specific properties of the electric motor:

Capturing the internal joint damping, which results in an effective increase in Kd (+~0.2)
Clipping the torque to the velocity-dependent torque limit, which for my Go1 is ~30N-m at zero velocity ("stall torque") and decreases ~linearly to 0N-m at the max velocity of ~40rad/s (Check out this paper Fig. 3 for a nice diagram: https://arxiv.org/pdf/2312.17507.pdf)
Because it observes the history of inputs, it might capture some non-markovian properties of the actuator, but I haven't characterized this yet

Regarding your questions:

The actuator network helps -- less for the walk-these-ways policies since they have good heuristics in the reward function, but more so for other policies. Instead of using the actuator network, explicitly modeling the actuator properties like velocity-dependent torque limits and internal joint damping can also work. I have yet to evaluate the difference thoroughly on the real robot, but both work decently for controllers I've trained.
The training dataset for the actuator network was collected by running a walking controller trained with no actuator network. I ran the robot around with different gaits and dropped it from some height a few times to incur high torques.
The actuator network is potentially prone to out-of-distribution issues since there's not much training data around the torque and velocity limits or where the action changes very fast. If you train policies without regularization, like an action rate penalty, this could become significant. I'd say it's worth trying some combo of modeling the actuator properties above explicitly + doing some domain randomization to see if it works better in those cases

Gabe

Improbable-AI / walk-these-ways

Questions about Actuator Network #67