Batou1406 / dls_orbit_bat_private

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-orbit.github.io/orbit/
Other
1 stars 0 forks source link

Climb Terrain Curriculum - Can't progress to harder terrain #20

Open Batou1406 opened 5 months ago

Batou1406 commented 5 months ago

Climb Terrain Curriculum - Can't progress to harder terrain

Context

The objective is to train a policy able to climb up and down stairs. For that, a terrain called STAIRS_TERRAINS_CFG has been prepared. It consist of successive pyramid (or inverse pyramids), called sub-terrains, with step size increansingly larger. The robot should learn how to traverse these sub-terrains. The command is mainly a forward speed, and we're not intersted in omnidirectionnal walking.

Parameters

Problem

The robot struggle to progress to harder sub-terrain and is often stuck at the border between sub-terrain. After careful debugging, it doesn't seem to be a terrain issue (bug), but more a policy/configuration issue (tuning). It just didn't learn how to transition between sub-terrains. It make some sens, since if he was going downstairs, he then need to go upstairs, which for terrain with big step size, is quite a radical change. Almost all the weight is on the front during a descent and transitioning would require, very large torque/force and maybe some kind of manoeuvre that may not be beniefical in term of cost function. Please not that this is observed only in the pyramidal terrain. The inverse pyramid doesn't suffer as much of this problem, since it is easier to transition.

image

Moreover, another problem is how the curriculum is defined. In order to progress in the terrain there are two conditions :

The problem comes on how the progress is defined. 50% of the terrain distance, means simply that the robot has reached the sub-terrain border. Since, it struggles to transition between terrain this limit the progress. In other word. the robot has reached the terrain-border, so it has successfully made his way through the terrain, but due to how the curriculum is computed, it doesn't make it to an harder terrain. In addition, the maximum forward speed is set to $0.5[\frac{m}{s}]$, which means, it would reach the border in $10[s]$, and then be stuck for $10[s]$, given the total $20[s]$ duration of an episode. The average speed would then ~50% of the commanded speed, thus the robot may even regress eventhough he made it to the border. One may consider changing the episode duration or the maximal forward speed.

Finally, a last problem, is that the robot may progress in terrain, eventhough he falled. He could made a large progress in the terrain, just because he falled down the stairs. Advancing the difficulty in this case should be avoided.

Solution

Several solution exists to these problems, but may end-up in different behaviour.

1. Tuning the terrain confinguration to make the sub-terrains border traversable

One could put a flat border between terrains. This would ease the sub-terrain transition, but decrease also the problem difficulty, which may not be what we want.

2. Tune the cost function to make the robot traverse sub-terrain borders

However, this may be complex and not desiderable.

3. Change the progress and regress condition in the curriculum term

One could decrease the threshold of distance walked, so that the robot would progress when he is close to the boarder, and not only when he crossed it.

4. Change the commanded velocity

If the robot is told to go in diagonal, he could make a bigger progress before reaching the border, thus tweaking the problem.

4. Make robot that fall regress in terrain.

Straightforward and should be implemented.

5. Diminish the episodic length

This would avoid the problem of regressing because the robot was stuck at the border. Moreover, this may be beneficial for training wall clock time. This would be similar to increasing the sub-terrain size.

Batou1406 commented 5 months ago

Modification in the Terrain Curriculum

  1. A flat border between terrains had been added -> easing the terrains border traversability
  2. Threshold for progress have been decreased from 100% of the distance to the border to 80%.
  3. Episodic length has been diminish from 20s to 12s but will be put to 15s.
  4. The distance to progress in the terrain was computed as the 2D distance (XY world plane). However, in these kind of terrain with very large height difference between start and finish the 3D distance is more reprensentative of the progress.

New friction constraints

Previously, friction cone was used to compute a penalty for a violation that would results in slipping. However, I believed this wasn't has effective has intended. Instead, I'll try a new constraints that penalize for foot displacement (ie. foot speed), while the foot is supposed to be in contact (ie. when the model base variable 'c' is equal to 0).

New tracking reward

Previously, a reward was given for tracking a velocity in the xy plane. However, this was problematic since the change of slope in the terrain are so big, that in seemed for the robot that it needed to go into a wall. Instead, now reward is given for average velocity from the terrain origins. the velocity command is thus not really more needed

Batou1406 commented 5 months ago

FuM - Giulio's update

Add more terrain randomization

The way 'orbit' implemented the terrain curriculum is :

This means, that the robot won't see the easy terrain anymore as he progress in the curriculum and, if he' stuck at difficulty 5, would only see terrain of that difficlulty.

Instead, I propose antoher way to progress in the curriculum :

This would lead in greater randomization in the terrain will maintening sufficient exploration in the higher difficulty terrain (modulated by the parameter p)

Friction Constraints

We agreed that the foot displacement (for leg in stance) penalty was better than the friction cone violation penalty since it doesn't really on friction coefficient µ. In addition, we didn't agree yet on if it was better to impose the constraints or only penalize for violation in the training (in deploy, it is sure we can impose the constraints) :

However, actually enforcing the constrains is not straightforward and may be infeasible. For example, if the robot is going downward, with CoM above the two front leg. It can genearate a large $F_{xy}$ without violating the friction cone, which would make the two leg from behind slip.

Finally, it has been decided to implement the function to enforce the constraints and train two policy to evaluate the differences.

Tracking reward : Velocity or position : how to do it ?

Open question that needs answer

Batou1406 commented 5 months ago

Friction cone constraint has been implemented and tested over some values. As expected it doesn't fully prevent leg from slipping

Batou1406 commented 5 months ago

Tracking reward : Velocity or position : how to do it ?

We stick to velocity command, but try to allow a wider range for maximal reward (like a plateau), to allow some flexibility in the speed tracking. The size of the plateau should vary with the terrain difficulty

Batou1406 commented 5 months ago

Tracking Reward : Velocity or Position : How to do it ?

We've decided to stick to to velocity tracking since this is the easiest for the problem definition, and we did not found any good formulation with position tracking.

However, we decided to relax the constraints on speed tracking proportionnally to the terrain difficulty and required speed. For this I created I new reward function.

Originally, the velocity was tracked with an exponential kernel, which give a reward of 1, if robot's speed = desired speed, and has an exponential decay to 0 with $std$ speed as they diverge.

New soft expoenential kernel

This new kernel aim to relax the constraint on speed tracking, and allow the robot to obtain maximal reward for a larger range of speed. For some kind of tolerance the robot would be able to obtain the maximum reward, which should give him more freedom with speed tracking for challenging obstacles.

Tolerance

The new function crafted aim to relax the constraints accroding to terrain difficulty and the commanded speed. For this we define the parameter tolerance : $$tolerance = \alpha \cdot || \vec v_{desired}|| \cdot difficulty$$ with alpha a tuning parameter.

Relaxing direction

Then, we aim to relax the constraint on the robot speed only in the direction of the desired speed. For that we project the robot speed on the desired speed. $$\vec v_{rob,xy} = (v_x, vy) \to v{rob,x'y'} = (v_x', vy')$$ With $(x', y')$ new axis, with $x'$ parallel to $\vec v{cmd}$$ and $y'$ perpendicular to $\vec v_{cmd}$

With this new formulation, we can compute the speed tracking error as two term :

With $\theta$ the angle between $\vec v{cmd}$ and $\vec v{rob}$

Relaxing

With the tolerance parameter and the relaxing direction, one can then relax the constraint on the forward speed tracking error, with a piecewise function :

Bringing everything together

Finally, one can compute the expoential kernel normally : $$e^{-\frac{relaxforwardspeederror^2 + lateralspeederror^2}{std^2}}$$

This function has the benefit to remain continously and differentiable on $R^2$

Visualization

image image

Function visualization

image

Batou1406 commented 5 months ago

Update

Batou1406 commented 5 months ago

Task Results

I trained a new policy fewer weight and the latest implementation → And it works very well ! The weight are :

good_climb_few_w2

good_climb_few_w2_2

climbup1

Curriculum Improvement

However, the curriculum kind of stop arround level 4-5 and I believe it could do better ! image

Maybe, it is just not progressing sufficiently quickly in the terrain, given the episode length, for it to reach the sucess condition. This may make sense, since with harder terrain and the soft kernel, more flexibility is given on the speed tracking, and it will indeed go a bit slower. Also the terrain are quite big wrt to 'standard'.

One option would it to make it progress to harder terrain as long as it don't fall. Or make it progress after a shorter distance traveled

Batou1406 commented 4 months ago

There was a mistake in the way I sampled between random terrain difficulty and max difficulty. Now this should have been fixed.

Moreover, I changed the curriculum threshold for the climb terrain. Now :

Increase difficuly if :
Decrease difficulty if :