Bug in running had3qn + smac

raykr commented 1 year ago

Script:

python train.py --algo had3qn --env smac --exp_name had3qn_smac_test

Error:

Traceback (most recent call last):
  File "train.py", line 87, in <module>
    main()
  File "train.py", line 82, in main
    runner.run()
  File "/home/raykr/workspace/HARL/harl/runners/off_policy_base_runner.py", line 156, in run
    obs, share_obs = self.warmup()
  File "/home/raykr/workspace/HARL/harl/runners/off_policy_base_runner.py", line 234, in warmup
    new_obs, new_share_obs, reward, done, infos, _ = self.envs.step(actions)
  File "/home/raykr/workspace/HARL/harl/envs/env_wrappers.py", line 132, in step
    return self.step_wait()
  File "/home/raykr/workspace/HARL/harl/envs/env_wrappers.py", line 317, in step_wait
    results = [env.step(a) for (a, env) in zip(self.actions, self.envs)]
  File "/home/raykr/workspace/HARL/harl/envs/env_wrappers.py", line 317, in <listcomp>
    results = [env.step(a) for (a, env) in zip(self.actions, self.envs)]
  File "/home/raykr/workspace/HARL/harl/envs/smac/StarCraft2_Env.py", line 542, in step
    sc_action = self.get_agent_action(a_id, action)
  File "/home/raykr/workspace/HARL/harl/envs/smac/StarCraft2_Env.py", line 713, in get_agent_action
    assert avail_actions[action] == 1, "Agent {} cannot perform action {}".format(
AssertionError: Agent 1 cannot perform action 0
RequestQuit command received.
Closing Application...
unable to parse websocket frame.

Meanwhile, happo, haa2c, hatrpo and mappo are both running in smac normally. Only had3qn has this error.

Ivan-Zhong commented 1 year ago

Hello. Yes, we are aware of this. The reason is that we haven't supported available actions for had3qn and thus it cannot be used on SMAC. We may add support for this in the future. :)

raykr commented 1 year ago

Is it feasible to add get_avail_agent_actions check before sample actions?

Ivan-Zhong commented 1 year ago

get_avail_agent_actions is a function inside the environment for deciding available actions and reporting to the runner. For had3qn agents to choose the appropriate actions, a feasible way is to choose the argmax actions within the available actions. Also, the training process needs to be modified accordingly.

raykr commented 1 year ago

I saw it passed in warmup phase but errored in train phase, the sample_actions inside warmup has available_actions and Categorical().sample() restrict, but in train phase actor.get_actions(obs, add_random) has no available_actions param. Whether need to change q_net model just like hasac StochasticMlpPolicy?

Ivan-Zhong commented 1 year ago

Hello. To support available_actions in HAD3QN, the DuelingQNet model does not need to be changed. Instead, had3qn actor and discrete q critic should be modified to always choose actions within the available ones.

PKU-MARL / HARL

Bug in running had3qn + smac #6