Batou1406 commented 1 month ago

Speed Task

Topic to discuss about tuning confirguration for the pseed task

Actual Restults

Actually, Aliengo achieved 60% of the speed curriculum difficulty : which correspond to 60% of maximum velocity difficulty ($3[\frac{m}{s}]$) → $\approx 1.8[\frac{m}{s}]$, with very nice gait pattern and no noticable simulator glitches.

speedGood

Latest Noticable Improvement

Height Track soft exponential Kernel

I re-used the soft exponential kernel definition as presented in #20 to track the robot height. Function visualization. This allow to have shark exponential kernel, while allowing some tolerance on the exact heigt target. Moreover, I implemented the proprioceptive height computation, which enable this function for all sort of terrain, which is an improvement with what was originally implemented in 'orbit'.

Foot Closeness penalty

I added a penaly for feet that are two close to each other (in the xy plane). The distance is the euclidian distance (thus one could define a circle arround the foot where other feet get a penalty for beeing inside). Three type of kernel have been implemented 'constant', 'linear' and 'quadratic'. 'constant' kernel has been tested and successfully prevent the robot to superpose feet and fixed the problematic behaviour.

Intersting results

After extensive training, a galop gait emerged. However, this leverage some simulation artefact by crossing legs wich shouldn't be possible in real setup with self collision enable. However, it is intersting to see that a galop gait may indeed be an optimum and I believe that we may see a correct one with a bit more exploration.

Potential limit

The way the curriculum is computed may still give troubles for progression. Indeed only distance moved away from origin is reward when traveling at fast speed. This doesn't take into account the total walked distance since the robot also has an angular velocity to track. Effectivly compute the total distance may be indeed tricky.

Next Steps

[x] Fix the distance computation in the curriculum to see if this change the training performance
[ ] Penalise for very low leg frequency a,d very high duty cycle to avoid weird behaviour at zero speed

Batou1406 commented 4 weeks ago

Implementation update

I change the way the 'walked distance' was computed. Instead of doing the difference from start to finish (that don't take into account the curve induced by angular velocity), I kept track of the 'cumulative' distance walked (simple forward integration of the instantanious speed along the episode horizon). This effectively fix the curriculum problem !

However, a galop gait still emerge but with crossed feet, which is not feasible. In simulation, self collision isn't enable, so it is possible. I will try to enable self collision and see if the model is still able to be simulated.

In conclusion some tuning are still required but we're close to the objective !

speed1

Batou1406 commented 3 weeks ago

Self Collision and Rewards

I enabled the self collision of the robot and didn't notice any changes in (computationnal) performance. However, the robot could not learn a walking policy anymore and fall into a 'standing policy' local minima. Thus I add to reduce the numbers of penalty : The new reward function consists of :

track_lin_vel_xy_exp
track_ang_vel_z_exp
track_robot_height_exp
penalty_lin_vel_z_l2
penalty_ang_vel_xy_l2
penalty_dof_torques_l2
penalty_dof_acc_l2
undesired_contacts
flat_orientation_l2
dof_pos_limits
penalty_friction
penalty_CoT
penalty_close_feet

Curriculum

Moreover, I made it more difficult to progress into the curriculum. As a reminder, the speed curriculum consists of a difficulty in [0,1] (common to all robots), and a maximum velocity range. The speed is then sampled uniformaly in [0, difficulty*maxmimum velocity range]. Finally, we update the difficulty based on the performance of the robots that had to walk at at least 90% of the current maximum speed. Difficulty increase if the robot walked at least 90% of the required distance, and decreased if it walked less than 70%. This showed great and consistent results.

Results

:warning: Video are display slower than real-time :warning:

After 2'100 Iterations

Achieved ~70% of the curriculum. Ie. successfully traveled at at least 90% of 90% of 70% of $3 [\frac{m}{s}]$ → $1.71 [\frac{m}{s}]$, potentially at $2.1[\frac{m}{s}]$

1.4 [m/s]

speed1.4ms.webm

1.8 [m/s]

speed1.8ms.webm

After 15'000 iterations

Achieved ~85% of the curriculum. Ie. successfully traveled at at least 90% of 90% of 85% of $3 [\frac{m}{s}]$ → $2.05 [\frac{m}{s}]$, potentially at $2.55[\frac{m}{s}]$

Batou1406 / dls_orbit_bat_private

Speed Task #22