AndyCao1125 / MambaDM

13 stars 0 forks source link

About the training time #1

Open Liuxueyi opened 2 weeks ago

Liuxueyi commented 2 weeks ago

Hi, I just follow your architecture and run the code based on https://github.com/Toshihiro-Ota/decision-mamba. But the training time is unacceptable, one epoch needs 8 hours. Do you have any suggestions and when will release the code?

rginjapan commented 2 weeks ago

@Liuxueyi can I ask one question? what is the purpose of this kind of RL trajectory task? the input and output are both contains the actions at each timestamp?

Liuxueyi commented 1 week ago

You can read this paper, Decision Transformer: Reinforcement Learning via Sequence Modeling. The method models the RL decision task as the sequence model problem. The input is a RL trajectory and the output is an action.

rginjapan commented 1 week ago

@Liuxueyi Thanks for your reply, but the input trajectory includes the (state, action, reward).