SHAC and policies for partial observability

NVlabs / DiffRL

[ICLR 2022] Accelerated Policy Learning with Parallel Differentiable Simulation

https://short-horizon-actor-critic.github.io/

Other

263 stars 43 forks source link

SHAC and policies for partial observability #6

Closed EelcoHoogendoorn closed 1 year ago

EelcoHoogendoorn commented 2 years ago

I was wondering if you have made any attempt at combining SHAC with an LSTM or transformer policy, or some policy that effectively can reason about some history of states, rather than just the current one; as is desirable for instance when dealing with partial observability of the state.

While conceptually it does not sound too complicated, I know that getting the implementation details right can be tricky for something like PPO; and I was curious if you have attempted any such thing, and if so if there were any issues you have ran into?

eanswer commented 2 years ago

We didn't try either LSTM or transformer models since it has been shown in previous works that simple MLP models work reasonably well on the tasks we considered. In more complicated tasks, those more advanced architectures may have their benefits and we agree that incorporating those models is not trivial and does require some proper implementations.

EelcoHoogendoorn commented 2 years ago

Thanks for the fast response; good to know at least I wont be reinventing the wheel if I do try out such an implementation. Quite some work, but conceptually I do not expect there to be any obstacles, would you?

eanswer commented 2 years ago

you are right. I think there should be no big obstacles.