Open hdelecki opened 1 year ago
The example was removed in the latest version, but the mechanism still exists for people to create their own version. The ant ARS example should work, but might require some tuning of hyperparameters to make it walk properly.
Thanks so much! Do you know why the previous version using halfcheetah didn't work?
Are there any specific mechanism parameters or functions I would need to implement to create an environment for halfcheetah similar to the new ant ARS?
Not exactly sure what the issue was before, but there were a lot of changes on the simulation and contact behavior. I believe the training success is rather sensitive to these parameters and to the reward function, so that could be what broke it.
As a rough starting point:
get_state
function or how the reward is calculated because some dimensions are going to be different.rollout_policy
function. I guess the tradeoff between control action and forward reward is important. And you can also change the scale
of parameters in Policy
and the noise
in HyperParameters
to change the initial parameters and their updates.If you get something to work, also improvements to the ant, open a pull request and we can integrate that.
Running the
halfcheetah_ars.jl
example, I expected to see policy behavior similar to what is shown in the docs. Instead, I see that ARS gets a mean reward of around -23 and the resulting policy tends to move backward. Is this the expected behavior?I'm using julia 1.8, Ubuntu 20.04, and the main branch of Dojo.jl