LeCAR-Lab / CoVO-MPC

Official implementation for the paper "CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design" accepted by L4DC 2024. CoVO-MPC is an optimal sampling-based MPC algorithm.
https://lecar-lab.github.io/CoVO-MPC/
Apache License 2.0
115 stars 7 forks source link

🐛 Large tracking error with PPO learned policy #11

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago

Performance

30 centimeter tracking error is relatively large.

Copy of ppo

https://github.com/jc-bao/quadjax/assets/60093981/5d2e741e-2be4-4d0e-9a53-49616773f645

Next step

jc-bao commented 1 year ago

Slow down the trajectory

initial value: A1=0.8 w1=1.5 a1_max = 1.8m/s^2 A2=0.8 w2=3.0 a1_max = 7.2m/s^2

now: a1_max = 0.45 a2_max.= 1.8

Result ppo

https://github.com/jc-bao/quadjax/assets/60093981/0c2135b2-085c-4919-9185-ac7b605aa38a

After training more steps:

https://github.com/jc-bao/quadjax/assets/60093981/372b2176-6205-4ff6-89c6-01fd3065f27c

Copy of Copy of ppo

Conclusion

jc-bao commented 1 year ago

Other's PPO performance

This is the result reported from APG paper:

image

Makes PPO performance degradation accountable.

jc-bao commented 1 year ago

Simple reward engineering

image

    reward = 0.9 - \
        0.05 * err_vel - \
        err_pos * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 4, 0, 1) * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 8, 0, 1) * 0.2 - \
        jnp.clip(jnp.log(err_pos + 1) * 16, 0, 1) * 0.1 - \
        jnp.clip(jnp.log(err_pos + 1) * 32, 0, 1) * 0.1

plot ppo

https://github.com/jc-bao/quadjax/assets/60093981/7c8dd3f7-8edc-4312-b2f3-175a3de3faaf

jc-bao commented 1 year ago

Conclusion