Closed anahrendra closed 1 year ago
Hi @anahrendra , thanks for this note. You're right that the implementation and paper differ slightly, and I didn't catch this before! The code is correct, the paper contains a mistake here.
I think the effect of this should be a constant scaling of the reward function, since these two reward terms are added together at every timestep: $[(1.0-C\text{foot}^\text{cmd}) + (C\text{foot}^\text{cmd})] = 1.0$. So, I would expect the impact of this change on the final policy to be minor. But do let me know if you see otherwise (there are a lot of operations being composed on this term)
-Gabe
Hi! Thanks for your implementation. That makes sense.
However, I tried to run a training with your code (nothing is changed at all), but I could not obtain the results as what you have in the pretrained weight. A bit of summary is as follows:
Could you by any chance directly train using your github code to verify if there is something missing? And how many iterations are required to produce a good result?
Thanks in advance for your help!
@anahrendra , I did some local testing and I suspect this is because of the larger gravity range in this repo's default config: https://github.com/Improbable-AI/walk-these-ways/blob/master/scripts/train.py#L49
In the paper, we used max gravity randomization 1.0
but I seem to have provided the config with more challenging max gravity randomization of 2.0
here. Training with this range might be unstable. Sorry about that!
Please try changing line 49 of train.py to:
Cfg.domain_rand.gravity_range = [-1.0, 1.0]
And let me know if that fixes the issue. On my machine, it converges after around 10k iterations.
By the way, the friction range in this codebase is also a bit wider than in the paper, and the max footswing height is higher. See appendix Table 6 https://arxiv.org/pdf/2212.03238.pdf for the values used in the paper. I may just update the repo in a bit to have all the original parameters
P.S. to debug with a bit faster convergence and verify that everything else is working properly, you can try turning off gravity randomization entirely ( Cfg.domain_rand.gravity_range = [-0.0, 0.0]
).
Hi!
Thanks a lot for your detailed support. I will try your fix as soon as possible and let you know about the results.
Have a nice weekend!
I have updated the default parameters in train.py
(728058d24303cd209e5d82d8d2c0c240a40d2841)
Hi! I just noticed that there is a difference in the reward function between the paper and the code in this repo. In the paper, it is written as follows:![Screenshot from 2023-01-31 22-04-56](https://user-images.githubusercontent.com/36684723/215768030-4cbca8cd-ba2f-461a-84c9-0923ba20dda4.png)
However, the code implementation is `
`
The
1-torch.exp()
part makes it different with the ones in the paper. Could you tell me which one is the correct one?Thanks!