NVlabs / DiffRL

[ICLR 2022] Accelerated Policy Learning with Parallel Differentiable Simulation
https://short-horizon-actor-critic.github.io/
Other
263 stars 43 forks source link

Example Request - Cartpole/Ant using Warp instead of dFlex #2

Open korzen opened 2 years ago

korzen commented 2 years ago

Hi, first of all, thanks a lot for your great piece of software! We are really excited to apply it in our research in surgical simulation. However, we ran into problems while trying to switch dFlex to its successor - Warp.

Would it be possible to provide us with a minimal reference Cartpole and/or Ant SHAC example using Warp? That would be very helpful not only for our group but also for other users.

Thank you

mmacklin commented 2 years ago

Hi @korzen, thanks for the message, we are working on exactly what you asked for and should have something to share soon.

Adding @ViktorM who may be able to provide more information.

Best, Miles

ViktorM commented 2 years ago

Hi @korzen,

I have these environments implemented with Warp internally. Can train them with PPO and am working to get them trained with SHAC. About the release timeline, I need to discuss it with @mmacklin

Best, Viktor

korzen commented 2 years ago

Hi @mmacklin and @ViktorM and thanks a lot for your reply. Please let us know about the estimated "official" release date. We will keep digging on our side, but if you have got any ad-hoc code snippet, which you could share right away, we would be very grateful.

In parallel, we will proceed with porting my surgical simulations from Unity's Burst to Warp.

Thanks a lot again for you awesome work and we are really looking forward to your sample code

mmacklin commented 2 years ago

We'll aim to get the main RL examples shipped and training with PPO within the next month, from there we'll look at doing the full SHAC training using gradients.

Looking forward to what you do with Warp, please let us know if you have any other issues in the meantime.

Cheers, Miles

korzen commented 1 year ago

Hi @eanswer was this really completed? Could you please point me to implementation of SHAC usign Warp as simulation backend? Thanks!