Improbable-AI / walk-these-ways

Sim-to-real RL training and deployment tools for the Unitree Go1 robot.
https://gmargo11.github.io/walk-these-ways/
Other
492 stars 129 forks source link

Question: Status of Pretrained Model #22

Closed asalbright closed 1 year ago

asalbright commented 1 year ago

Hi there and thanks for providing the repo in addition to a Pretrained model.

I am wondering where the Pretrained model comes from? What iteration, more or less? What is the status of the curriculum?

When I inspect the status of the provided curriculum:

runs/gait-conditioned-agility/pretrain-v0/train/025417.456545/curriculum/distribution.pkl

I see that distribution["iteration"] returns 0, and looking at the distribution["distribution"], it seems the distribution is limited to values that are the initial values. [-1, 1] for the linear/angular velocities.

I am curious because I would like to do some fine-tuning on your set of weights that seem to be working well enough. However, if I want to pick up training with those weights, I would assume I also want to pick up with the curriculum no?

asalbright commented 1 year ago

Realizing that the lack of a "current" or "matching" curriculum, doesn't seem to matter all that much, since the policy learns rather quickly to move up the curriculum since it can achieve all the reward thresholds.

Still, I am interested as to where the policy came from.

gmargo11 commented 1 year ago

Hi @asalbright ,

The pretrained model was trained for 40k iterations, well past convergence. It was also trained with higher gravity randomization than the default settings in train.py (Cfg.domain_rand.gravity_range = [-2.0, 2.0]), which results in better robustness but is a bit less consistent in the training result (see https://github.com/Improbable-AI/walk-these-ways/issues/8 for discussion)

I think the distribution.pkl may have been logged incorrectly, however, as you suggest an easy fix is to resume training, and the pretrained policy will quickly advance up the curriculum

-Gabe