Open ErinTUDelft opened 7 months ago
Really sorry for the delayed reply. I have been busy with another project lately.
Unfortunately, we found TDMPC (our implementation) performs poorly on tasks beyond hovering and simple tracking. The critical issue is that planning is too time-consuming to be used in a massively parallelized setting. Therefore the algorithm did not appear in the paper and is no longer maintained. I would suggest not using it unless you have specific interest in model-based methods.
Oh that is a shame to hear, I was quite excited to try this algorithm! In their original paper they do mention the following concerning the planning though:
"For example, we found in Figure 8 that we can reduce the planning horizon of TD-MPC on the Quadruped Run task from 5 to 1 with no significant reduction in performance, which reduces inference time to approximately 12ms per step"
Was using a horizon length of 1 instead of 5 something you have tried? Their new TDMPC2 also claims to be faster and as I'm doing a side project on model-based RL I would still like to try and implement it! Have there been many backward compatibility breaking changes in the meantime or would it still be possible to run the tdmpc algorithm?
I think the main issue is having to perform CEM planning for all the parallel environments. For example, you need to sample 4096 * num_samples * planning_horizon
actions and rollout the learned model at each step.
However, I do think the capability of collecting a large amount of data is favorable for model-based RL. It's just some modifications are needed.
The algorithm implemented is no longer compatible. But it should not be too hard to make it run again, since the data collection interface remains the same. Possibly you just need to change some keys of the tensordicts.
TypeError: TDMPCPolicy.init() got multiple values for argument 'device'
@ErinTUDelft Hello, have you now solved the related bug about the tdmpc algorithm? I've run other algorithms and I'm getting the same error as yours:
TypeError: XXXXPolicy.__init__() got multiple values for argument 'device'
How did you get this algorithm to work? Looking forward to your answers. Thanks!
Thank you for creating this library, it really is amazing! However, when running train.py (algo=tdmpc, task=Hover) from /scripts I get the following error:
When running the same command from /scripts_paper I get this error:
Which of the two training files would you recommend using? And how can I resolve this issue? Kind regards, Erin