Closed jc-bao closed 1 year ago
initial value: A1=0.8 w1=1.5 a1_max = 1.8m/s^2 A2=0.8 w2=3.0 a1_max = 7.2m/s^2
now: a1_max = 0.45 a2_max.= 1.8
Result
https://github.com/jc-bao/quadjax/assets/60093981/0c2135b2-085c-4919-9185-ac7b605aa38a
After training more steps:
https://github.com/jc-bao/quadjax/assets/60093981/372b2176-6205-4ff6-89c6-01fd3065f27c
This is the result reported from APG paper:
Makes PPO performance degradation accountable.
reward = 0.9 - \
0.05 * err_vel - \
err_pos * 0.4 - \
jnp.clip(jnp.log(err_pos + 1) * 4, 0, 1) * 0.4 - \
jnp.clip(jnp.log(err_pos + 1) * 8, 0, 1) * 0.2 - \
jnp.clip(jnp.log(err_pos + 1) * 16, 0, 1) * 0.1 - \
jnp.clip(jnp.log(err_pos + 1) * 32, 0, 1) * 0.1
https://github.com/jc-bao/quadjax/assets/60093981/7c8dd3f7-8edc-4312-b2f3-175a3de3faaf
Performance
30 centimeter tracking error is relatively large.
https://github.com/jc-bao/quadjax/assets/60093981/5d2e741e-2be4-4d0e-9a53-49616773f645
Next step