Unexpected policy behavior in halfcheetah ARS example

hdelecki commented 1 year ago

Running the halfcheetah_ars.jl example, I expected to see policy behavior similar to what is shown in the docs. Instead, I see that ARS gets a mean reward of around -23 and the resulting policy tends to move backward. Is this the expected behavior?

I'm using julia 1.8, Ubuntu 20.04, and the main branch of Dojo.jl

janbruedigam commented 1 year ago

The example was removed in the latest version, but the mechanism still exists for people to create their own version. The ant ARS example should work, but might require some tuning of hyperparameters to make it walk properly.

hdelecki commented 1 year ago

Thanks so much! Do you know why the previous version using halfcheetah didn't work?

Are there any specific mechanism parameters or functions I would need to implement to create an environment for halfcheetah similar to the new ant ARS?

janbruedigam commented 1 year ago

Not exactly sure what the issue was before, but there were a lot of changes on the simulation and contact behavior. I believe the training success is rather sensitive to these parameters and to the reward function, so that could be what broke it.

As a rough starting point:

create a halfcheetah_ars environment similar to https://github.com/dojo-sim/Dojo.jl/blob/main/DojoEnvironments/src/environments/ant_ars.jl
create a training file similar to https://github.com/dojo-sim/Dojo.jl/blob/main/examples/learning/ant_ars.jl
These two files should be all you need. Modifying these files to use halfcheetah instead of ant might work, that is, run without error, but you may also change some details like the get_state function or how the reward is calculated because some dimensions are going to be different.
If everything runs, then you'll need to tune the reward in the rollout_policy function. I guess the tradeoff between control action and forward reward is important. And you can also change the scale of parameters in Policy and the noise in HyperParameters to change the initial parameters and their updates.
As an initial verification, you should also check if the magnitude of the control action makes sense. So not training too much but just running the sim and seeing if the halfcheetah moves millimeters, centimeters, meters, ... If the movement is too small all the tuning wont make much difference, if it's too big, the simulation might get unstable.

If you get something to work, also improvements to the ant, open a pull request and we can integrate that.

dojo-sim / Dojo.jl

Unexpected policy behavior in halfcheetah ARS example #74