dojo-sim / Dojo.jl

A differentiable physics engine for robotics
MIT License
294 stars 25 forks source link

Unexpected policy behavior in halfcheetah ARS example #74

Open hdelecki opened 1 year ago

hdelecki commented 1 year ago

Running the halfcheetah_ars.jl example, I expected to see policy behavior similar to what is shown in the docs. Instead, I see that ARS gets a mean reward of around -23 and the resulting policy tends to move backward. Is this the expected behavior?

I'm using julia 1.8, Ubuntu 20.04, and the main branch of Dojo.jl

janbruedigam commented 1 year ago

The example was removed in the latest version, but the mechanism still exists for people to create their own version. The ant ARS example should work, but might require some tuning of hyperparameters to make it walk properly.

hdelecki commented 1 year ago

Thanks so much! Do you know why the previous version using halfcheetah didn't work?

Are there any specific mechanism parameters or functions I would need to implement to create an environment for halfcheetah similar to the new ant ARS?

janbruedigam commented 1 year ago

Not exactly sure what the issue was before, but there were a lot of changes on the simulation and contact behavior. I believe the training success is rather sensitive to these parameters and to the reward function, so that could be what broke it.

As a rough starting point:

If you get something to work, also improvements to the ant, open a pull request and we can integrate that.