dfki-ric-underactuated-lab / double_pendulum

Dual purpose Acrobot and Pendubot Platform
BSD 3-Clause "New" or "Revised" License
32 stars 22 forks source link

issue in train_sac.py #14

Open Astik-2002 opened 1 month ago

Astik-2002 commented 1 month ago

while training sac, the following error occured


  File "/home/astik/double_pendulum/examples/reinforcement_learning/SAC/train_sac_noisy_env.py", line 357, in <module>
    agent = SAC(
  File "/home/astik/anaconda3/envs/drones/lib/python3.10/site-packages/stable_baselines3/sac/sac.py", line 106, in __init__
    super(SAC, self).__init__(
  File "/home/astik/anaconda3/envs/drones/lib/python3.10/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 107, in __init__
    super(OffPolicyAlgorithm, self).__init__(
  File "/home/astik/anaconda3/envs/drones/lib/python3.10/site-packages/stable_baselines3/common/base_class.py", line 171, in __init__
    assert isinstance(self.action_space, supported_action_spaces), (
AssertionError: The algorithm only supports <class 'gym.spaces.box.Box'> as action spaces but Box(-1.0, 1.0, (1,), float32) was provided```
Astik-2002 commented 1 month ago

Also, DQN was unable to stabilize the trajectory after training for 100 epochs. is it expected? The final output of training DQN is pinned below

https://github.com/user-attachments/assets/1182ced0-1237-417c-8bb7-010b4907348d

fwiebe commented 1 month ago

Hi @Astik-2002 , Thanks for raising the issue. Regarding your first comment: That error occurs because the SAC uses StableBaselines3 and that expects the gym library while the training environment was migrated to the more modern gymnasium. I will check if/how StableBaselines3 can be used with gymnasium. Regarding your second comment: Yes that is expected. The DQN implementation discretizes the state space produces only a subpar policy.

Astik-2002 commented 1 month ago

Thanks for the comment. I'm facing issues understanding how the competition is going to be judged. Most controllers in literature used to stabilize acrobot or pendubot are already implemented in the realistic examples. But since they require significant parameter optimization, can a working controller from examples be submitted to the leader board, Even if it's not a new controller?

fwiebe commented 1 month ago

Hi @Astik-2002 , yes it is allowed to submit a modified controller from the leaderboard with tuned parameters, another filtering method, etc.