btx0424 / OmniDrones

https://omnidrones.readthedocs.io/
MIT License
147 stars 25 forks source link

TDMPC Error #67

Open ErinTUDelft opened 5 months ago

ErinTUDelft commented 5 months ago

Thank you for creating this library, it really is amazing! However, when running train.py (algo=tdmpc, task=Hover) from /scripts I get the following error:

Traceback (most recent call last):
  File "/home/erin/Thesis/OmniDrones/scripts/train.py", line 104, in main
    policy = ALGOS[cfg.algo.name.lower()](
TypeError: TDMPCPolicy.__init__() got multiple values for argument 'device'

When running the same command from /scripts_paper I get this error:

 File "/home/erin/anaconda3/envs/OmniDrones310/lib/python3.10/copy.py", line 161, in deepcopy
  File "/home/erin/Thesis/OmniDrones/scripts_paper/train.py", line 172, in main
    collector = SyncDataCollector(
  File "/home/erin/Thesis/Utils/rl/torchrl/collectors/collectors.py", line 539, in __init__
    (self.policy, self.get_weights_fn,) = self._get_policy_and_device(
  File "/home/erin/Thesis/Utils/rl/torchrl/collectors/collectors.py", line 178, in _get_policy_and_device
    policy = deepcopy(policy)
----------
  File "/home/erin/anaconda3/envs/OmniDrones310/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'omni.kit.app._app.IApp' object

Which of the two training files would you recommend using? And how can I resolve this issue? Kind regards, Erin

btx0424 commented 5 months ago

Really sorry for the delayed reply. I have been busy with another project lately.

Unfortunately, we found TDMPC (our implementation) performs poorly on tasks beyond hovering and simple tracking. The critical issue is that planning is too time-consuming to be used in a massively parallelized setting. Therefore the algorithm did not appear in the paper and is no longer maintained. I would suggest not using it unless you have specific interest in model-based methods.

ErinTUDelft commented 5 months ago

Oh that is a shame to hear, I was quite excited to try this algorithm! In their original paper they do mention the following concerning the planning though:

"For example, we found in Figure 8 that we can reduce the planning horizon of TD-MPC on the Quadruped Run task from 5 to 1 with no significant reduction in performance, which reduces inference time to approximately 12ms per step"

Was using a horizon length of 1 instead of 5 something you have tried? Their new TDMPC2 also claims to be faster and as I'm doing a side project on model-based RL I would still like to try and implement it! Have there been many backward compatibility breaking changes in the meantime or would it still be possible to run the tdmpc algorithm?

btx0424 commented 5 months ago

I think the main issue is having to perform CEM planning for all the parallel environments. For example, you need to sample 4096 * num_samples * planning_horizon actions and rollout the learned model at each step.

However, I do think the capability of collecting a large amount of data is favorable for model-based RL. It's just some modifications are needed.

The algorithm implemented is no longer compatible. But it should not be too hard to make it run again, since the data collection interface remains the same. Possibly you just need to change some keys of the tensordicts.

Sgmltd commented 2 months ago

TypeError: TDMPCPolicy.init() got multiple values for argument 'device'

@ErinTUDelft Hello, have you now solved the related bug about the tdmpc algorithm? I've run other algorithms and I'm getting the same error as yours: TypeError: XXXXPolicy.__init__() got multiple values for argument 'device'

How did you get this algorithm to work? Looking forward to your answers. Thanks!