Open Batou1406 opened 1 month ago
I change the way the 'walked distance' was computed. Instead of doing the difference from start to finish (that don't take into account the curve induced by angular velocity), I kept track of the 'cumulative' distance walked (simple forward integration of the instantanious speed along the episode horizon). This effectively fix the curriculum problem !
However, a galop gait still emerge but with crossed feet, which is not feasible. In simulation, self collision isn't enable, so it is possible. I will try to enable self collision and see if the model is still able to be simulated.
In conclusion some tuning are still required but we're close to the objective !
I enabled the self collision of the robot and didn't notice any changes in (computationnal) performance. However, the robot could not learn a walking policy anymore and fall into a 'standing policy' local minima. Thus I add to reduce the numbers of penalty : The new reward function consists of :
Moreover, I made it more difficult to progress into the curriculum. As a reminder, the speed curriculum consists of a difficulty
in [0,1] (common to all robots), and a maximum velocity range
. The speed is then sampled uniformaly in [0, difficulty
*maxmimum velocity range
]. Finally, we update the difficulty based on the performance of the robots that had to walk at at least 90% of the current maximum speed. Difficulty increase if the robot walked at least 90% of the required distance, and decreased if it walked less than 70%. This showed great and consistent results.
:warning: Video are display slower than real-time :warning:
Achieved ~70% of the curriculum. Ie. successfully traveled at at least 90% of 90% of 70% of $3 [\frac{m}{s}]$ → $1.71 [\frac{m}{s}]$, potentially at $2.1[\frac{m}{s}]$
Achieved ~85% of the curriculum. Ie. successfully traveled at at least 90% of 90% of 85% of $3 [\frac{m}{s}]$ → $2.05 [\frac{m}{s}]$, potentially at $2.55[\frac{m}{s}]$
Speed Task
Topic to discuss about tuning confirguration for the pseed task
Actual Restults
Actually, Aliengo achieved 60% of the speed curriculum difficulty : which correspond to 60% of maximum velocity difficulty ($3[\frac{m}{s}]$) → $\approx 1.8[\frac{m}{s}]$, with very nice gait pattern and no noticable simulator glitches.
Latest Noticable Improvement
Height Track soft exponential Kernel
I re-used the soft exponential kernel definition as presented in #20 to track the robot height. Function visualization. This allow to have shark exponential kernel, while allowing some tolerance on the exact heigt target. Moreover, I implemented the proprioceptive height computation, which enable this function for all sort of terrain, which is an improvement with what was originally implemented in 'orbit'.
Foot Closeness penalty
I added a penaly for feet that are two close to each other (in the xy plane). The distance is the euclidian distance (thus one could define a circle arround the foot where other feet get a penalty for beeing inside). Three type of kernel have been implemented 'constant', 'linear' and 'quadratic'. 'constant' kernel has been tested and successfully prevent the robot to superpose feet and fixed the problematic behaviour.
Intersting results
After extensive training, a galop gait emerged. However, this leverage some simulation artefact by crossing legs wich shouldn't be possible in real setup with self collision enable. However, it is intersting to see that a galop gait may indeed be an optimum and I believe that we may see a correct one with a bit more exploration.
Potential limit
The way the curriculum is computed may still give troubles for progression. Indeed only distance moved away from origin is reward when traveling at fast speed. This doesn't take into account the total walked distance since the robot also has an angular velocity to track. Effectivly compute the total distance may be indeed tricky.
Next Steps