Batou1406 / dls_orbit_bat_private

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-orbit.github.io/orbit/
Other
1 stars 0 forks source link

Speed Task #22

Open Batou1406 opened 1 month ago

Batou1406 commented 1 month ago

Speed Task

Topic to discuss about tuning confirguration for the pseed task

Actual Restults

Actually, Aliengo achieved 60% of the speed curriculum difficulty : which correspond to 60% of maximum velocity difficulty ($3[\frac{m}{s}]$) → $\approx 1.8[\frac{m}{s}]$, with very nice gait pattern and no noticable simulator glitches.

speedGood

Latest Noticable Improvement

Height Track soft exponential Kernel

I re-used the soft exponential kernel definition as presented in #20 to track the robot height. Function visualization. This allow to have shark exponential kernel, while allowing some tolerance on the exact heigt target. Moreover, I implemented the proprioceptive height computation, which enable this function for all sort of terrain, which is an improvement with what was originally implemented in 'orbit'.

Foot Closeness penalty

I added a penaly for feet that are two close to each other (in the xy plane). The distance is the euclidian distance (thus one could define a circle arround the foot where other feet get a penalty for beeing inside). Three type of kernel have been implemented 'constant', 'linear' and 'quadratic'. 'constant' kernel has been tested and successfully prevent the robot to superpose feet and fixed the problematic behaviour.

Intersting results

After extensive training, a galop gait emerged. However, this leverage some simulation artefact by crossing legs wich shouldn't be possible in real setup with self collision enable. However, it is intersting to see that a galop gait may indeed be an optimum and I believe that we may see a correct one with a bit more exploration.

Potential limit

The way the curriculum is computed may still give troubles for progression. Indeed only distance moved away from origin is reward when traveling at fast speed. This doesn't take into account the total walked distance since the robot also has an angular velocity to track. Effectivly compute the total distance may be indeed tricky.

Next Steps

Batou1406 commented 4 weeks ago

Implementation update

I change the way the 'walked distance' was computed. Instead of doing the difference from start to finish (that don't take into account the curve induced by angular velocity), I kept track of the 'cumulative' distance walked (simple forward integration of the instantanious speed along the episode horizon). This effectively fix the curriculum problem !

However, a galop gait still emerge but with crossed feet, which is not feasible. In simulation, self collision isn't enable, so it is possible. I will try to enable self collision and see if the model is still able to be simulated.

In conclusion some tuning are still required but we're close to the objective !

speed1

Batou1406 commented 3 weeks ago

Self Collision and Rewards

I enabled the self collision of the robot and didn't notice any changes in (computationnal) performance. However, the robot could not learn a walking policy anymore and fall into a 'standing policy' local minima. Thus I add to reduce the numbers of penalty : The new reward function consists of :

Curriculum

Moreover, I made it more difficult to progress into the curriculum. As a reminder, the speed curriculum consists of a difficulty in [0,1] (common to all robots), and a maximum velocity range. The speed is then sampled uniformaly in [0, difficulty*maxmimum velocity range]. Finally, we update the difficulty based on the performance of the robots that had to walk at at least 90% of the current maximum speed. Difficulty increase if the robot walked at least 90% of the required distance, and decreased if it walked less than 70%. This showed great and consistent results.

Results

:warning: Video are display slower than real-time :warning:

After 2'100 Iterations

Achieved ~70% of the curriculum. Ie. successfully traveled at at least 90% of 90% of 70% of $3 [\frac{m}{s}]$ → $1.71 [\frac{m}{s}]$, potentially at $2.1[\frac{m}{s}]$

1.4 [m/s]

speed1.4ms.webm

1.8 [m/s]

speed1.8ms.webm

After 15'000 iterations

Achieved ~85% of the curriculum. Ie. successfully traveled at at least 90% of 90% of 85% of $3 [\frac{m}{s}]$ → $2.05 [\frac{m}{s}]$, potentially at $2.55[\frac{m}{s}]$

1.7 [m/s]

speed21.7ms.webm

2.0 [m/s]

speed22.0ms.webm

2.3 [m/s]

speed22.3ms.webm