Open Batou1406 opened 5 months ago
Previously, friction cone was used to compute a penalty for a violation that would results in slipping. However, I believed this wasn't has effective has intended. Instead, I'll try a new constraints that penalize for foot displacement (ie. foot speed), while the foot is supposed to be in contact (ie. when the model base variable 'c' is equal to 0).
Previously, a reward was given for tracking a velocity in the xy plane. However, this was problematic since the change of slope in the terrain are so big, that in seemed for the robot that it needed to go into a wall. Instead, now reward is given for average velocity from the terrain origins. the velocity command is thus not really more needed
The way 'orbit' implemented the terrain curriculum is :
This means, that the robot won't see the easy terrain anymore as he progress in the curriculum and, if he' stuck at difficulty 5, would only see terrain of that difficlulty.
Instead, I propose antoher way to progress in the curriculum :
terrain_difficulty
== current_difficulty
: Evaluate to the robot performance and decide to increase, decrease or keep constant the current_difficulty
.terrain_difficulty
based on the current_difficulty
with the following law
p
) of the cases : Sample terrain_difficulty
with a uniform law in [0, current difficulty
] p
) of the cases : set terrain_difficulty
to current_difficulty
(ie. the current maximal difficulty)terrain_difficulty
This would lead in greater randomization in the terrain will maintening sufficient exploration in the higher difficulty terrain (modulated by the parameter p
)
We agreed that the foot displacement (for leg in stance) penalty was better than the friction cone violation penalty since it doesn't really on friction coefficient µ. In addition, we didn't agree yet on if it was better to impose the constraints or only penalize for violation in the training (in deploy, it is sure we can impose the constraints) :
However, actually enforcing the constrains is not straightforward and may be infeasible. For example, if the robot is going downward, with CoM above the two front leg. It can genearate a large $F_{xy}$ without violating the friction cone, which would make the two leg from behind slip.
Finally, it has been decided to implement the function to enforce the constraints and train two policy to evaluate the differences.
Open question that needs answer
Friction cone constraint has been implemented and tested over some values. As expected it doesn't fully prevent leg from slipping
We stick to velocity command, but try to allow a wider range for maximal reward (like a plateau), to allow some flexibility in the speed tracking. The size of the plateau should vary with the terrain difficulty
We've decided to stick to to velocity tracking since this is the easiest for the problem definition, and we did not found any good formulation with position tracking.
However, we decided to relax the constraints on speed tracking proportionnally to the terrain difficulty and required speed. For this I created I new reward function.
Originally, the velocity was tracked with an exponential kernel, which give a reward of 1, if robot's speed = desired speed, and has an exponential decay to 0 with $std$ speed as they diverge.
This new kernel aim to relax the constraint on speed tracking, and allow the robot to obtain maximal reward for a larger range of speed. For some kind of tolerance the robot would be able to obtain the maximum reward, which should give him more freedom with speed tracking for challenging obstacles.
The new function crafted aim to relax the constraints accroding to terrain difficulty and the commanded speed. For this we define the parameter tolerance : $$tolerance = \alpha \cdot || \vec v_{desired}|| \cdot difficulty$$ with alpha a tuning parameter.
Then, we aim to relax the constraint on the robot speed only in the direction of the desired speed. For that we project the robot speed on the desired speed. $$\vec v_{rob,xy} = (v_x, vy) \to v{rob,x'y'} = (v_x', vy')$$ With $(x', y')$ new axis, with $x'$ parallel to $\vec v{cmd}$$ and $y'$ perpendicular to $\vec v_{cmd}$
With this new formulation, we can compute the speed tracking error as two term :
With $\theta$ the angle between $\vec v{cmd}$ and $\vec v{rob}$
With the tolerance parameter and the relaxing direction, one can then relax the constraint on the forward speed tracking error, with a piecewise function :
Finally, one can compute the expoential kernel normally : $$e^{-\frac{relaxforwardspeederror^2 + lateralspeederror^2}{std^2}}$$
This function has the benefit to remain continously and differentiable on $R^2$
I trained a new policy fewer weight and the latest implementation → And it works very well ! The weight are :
However, the curriculum kind of stop arround level 4-5 and I believe it could do better !
Maybe, it is just not progressing sufficiently quickly in the terrain, given the episode length, for it to reach the sucess condition. This may make sense, since with harder terrain and the soft kernel, more flexibility is given on the speed tracking, and it will indeed go a bit slower. Also the terrain are quite big wrt to 'standard'.
One option would it to make it progress to harder terrain as long as it don't fall. Or make it progress after a shorter distance traveled
There was a mistake in the way I sampled between random terrain difficulty and max difficulty. Now this should have been fixed.
Moreover, I changed the curriculum threshold for the climb terrain. Now :
Climb Terrain Curriculum - Can't progress to harder terrain
Context
The objective is to train a policy able to climb up and down stairs. For that, a terrain called
STAIRS_TERRAINS_CFG
has been prepared. It consist of successive pyramid (or inverse pyramids), called sub-terrains, with step size increansingly larger. The robot should learn how to traverse these sub-terrains. The command is mainly a forward speed, and we're not intersted in omnidirectionnal walking.Parameters
Problem
The robot struggle to progress to harder sub-terrain and is often stuck at the border between sub-terrain. After careful debugging, it doesn't seem to be a terrain issue (bug), but more a policy/configuration issue (tuning). It just didn't learn how to transition between sub-terrains. It make some sens, since if he was going downstairs, he then need to go upstairs, which for terrain with big step size, is quite a radical change. Almost all the weight is on the front during a descent and transitioning would require, very large torque/force and maybe some kind of manoeuvre that may not be beniefical in term of cost function. Please not that this is observed only in the pyramidal terrain. The inverse pyramid doesn't suffer as much of this problem, since it is easier to transition.
Moreover, another problem is how the curriculum is defined. In order to progress in the terrain there are two conditions :
The problem comes on how the progress is defined. 50% of the terrain distance, means simply that the robot has reached the sub-terrain border. Since, it struggles to transition between terrain this limit the progress. In other word. the robot has reached the terrain-border, so it has successfully made his way through the terrain, but due to how the curriculum is computed, it doesn't make it to an harder terrain. In addition, the maximum forward speed is set to $0.5[\frac{m}{s}]$, which means, it would reach the border in $10[s]$, and then be stuck for $10[s]$, given the total $20[s]$ duration of an episode. The average speed would then ~50% of the commanded speed, thus the robot may even regress eventhough he made it to the border. One may consider changing the episode duration or the maximal forward speed.
Finally, a last problem, is that the robot may progress in terrain, eventhough he falled. He could made a large progress in the terrain, just because he falled down the stairs. Advancing the difficulty in this case should be avoided.
Solution
Several solution exists to these problems, but may end-up in different behaviour.
1. Tuning the terrain confinguration to make the sub-terrains border traversable
One could put a flat border between terrains. This would ease the sub-terrain transition, but decrease also the problem difficulty, which may not be what we want.
2. Tune the cost function to make the robot traverse sub-terrain borders
However, this may be complex and not desiderable.
3. Change the progress and regress condition in the curriculum term
One could decrease the threshold of distance walked, so that the robot would progress when he is close to the boarder, and not only when he crossed it.
4. Change the commanded velocity
If the robot is told to go in diagonal, he could make a bigger progress before reaching the border, thus tweaking the problem.
4. Make robot that fall regress in terrain.
Straightforward and should be implemented.
5. Diminish the episodic length
This would avoid the problem of regressing because the robot was stuck at the border. Moreover, this may be beneficial for training wall clock time. This would be similar to increasing the sub-terrain size.