Closed EelcoHoogendoorn closed 1 year ago
We didn't try either LSTM or transformer models since it has been shown in previous works that simple MLP models work reasonably well on the tasks we considered. In more complicated tasks, those more advanced architectures may have their benefits and we agree that incorporating those models is not trivial and does require some proper implementations.
Thanks for the fast response; good to know at least I wont be reinventing the wheel if I do try out such an implementation. Quite some work, but conceptually I do not expect there to be any obstacles, would you?
you are right. I think there should be no big obstacles.
I was wondering if you have made any attempt at combining SHAC with an LSTM or transformer policy, or some policy that effectively can reason about some history of states, rather than just the current one; as is desirable for instance when dealing with partial observability of the state.
While conceptually it does not sound too complicated, I know that getting the implementation details right can be tricky for something like PPO; and I was curious if you have attempted any such thing, and if so if there were any issues you have ran into?