Open Liuxueyi opened 2 weeks ago
@Liuxueyi can I ask one question? what is the purpose of this kind of RL trajectory task? the input and output are both contains the actions at each timestamp?
You can read this paper, Decision Transformer: Reinforcement Learning via Sequence Modeling. The method models the RL decision task as the sequence model problem. The input is a RL trajectory and the output is an action.
@Liuxueyi Thanks for your reply, but the input trajectory includes the (state, action, reward).
Hi, I just follow your architecture and run the code based on https://github.com/Toshihiro-Ota/decision-mamba. But the training time is unacceptable, one epoch needs 8 hours. Do you have any suggestions and when will release the code?