NVlabs / DiffRL

[ICLR 2022] Accelerated Policy Learning with Parallel Differentiable Simulation
https://short-horizon-actor-critic.github.io/
Other
267 stars 45 forks source link

Brax implementation #10

Open EelcoHoogendoorn opened 1 year ago

EelcoHoogendoorn commented 1 year ago

Thought id mention; there is a brax implementation of SHAC here

I suppose its hard to compare directly since the envs are not identical, but if one of the original authors of SHAC can review it for apparent agreement with your algorithms as intended, thatd be super useful.

eanswer commented 1 year ago

Thanks for bringing this up to us. The discussion in this thread is indeed exciting and we would very much love to see SHAC work in other simulators as well. Due to the differences between envs and simulators, there might need some parameter tuning though.

ViktorM commented 1 year ago

It's an interesting thread, thank you. Simulator differences and the way contacts are handled can change a lot of things and make SHAC training more challenging. How does your implementation perform with DiffRL repo?

EelcoHoogendoorn commented 1 year ago

Well im not a brax author or the author of this PR; just someone with an interest in differentiable physics and control. Im about to get started on my own JAX implementation of SHAC for my own relatively easily differentiable simulations.

However, one twist is that mine is a real-world application with very much partial observability and the likely need for a recurrent controller/state estimator to do any good. Itd be interesting to figure out the details of SHAC in a recurrent context, when its not just the physics that makes BPTT problematic. If anything I would expect the benefits of SHAC to be more pronounced; but I guess ill find out. If you are aware of any follow up research along these lines (involving RNN controllers) id love to hear about it.

In any case, those are just some random thoughts; but congrats on what I see as a very impactful and well written paper. I was just looking at your openreview, and its always such a drag... I do think your paper should get recognition, in that its approach is really unique I think, in terms of the attainable, compared with a minimal amount of conceptual complexity and code required. (at least it is if you take a differentiable simulation env as a given; which I do)