IDSIA / hhmarl_2D

Heterogeneous Hierarchical Multi Agent Reinforcement Learning for Air Combat
47 stars 10 forks source link

An Error is raised “Could not find L3 Fight Policy. Store in E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint” #2

Closed ad56917783 closed 3 months ago

ad56917783 commented 6 months ago

After training the hetero in mode=fight & escape on L3,I tried to train the hetero with “python train_hetero.py --epochs=10000 --restore=True --agent_mode=escape --level=4”,but I failed.An Error is raised “Could not find L3 Fight Policy. Store in E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint”,but the folder exists and under the folder exists ‘E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint\policies\ac1_policy’ and ‘E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint\policies\ac2_policy’.Looking foward to your reply.

ad56917783 commented 6 months ago

Could you share your checkpoints?

ad56917783 commented 6 months ago

(RolloutWorker pid=23212) D:\anaconda3\envs\hhmarl2D\lib\site-packages\gymnasium\spaces\box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32 (RolloutWorker pid=23212) logger.warn(f"Box bound precision lowered by casting to {self.dtype}") (RolloutWorker pid=11116) 2023-12-18 09:05:48,965 WARNING checkpoints.py:109 -- No rllib_checkpoint.json file found in checkpoint directory E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint! Trying to extract checkpoint info from other files found in that dir. (RolloutWorker pid=23212) 2023-12-18 09:05:49,846 ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=23212, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x00000233EC6700A0>) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\policy\policy.py", line 335, in from_checkpoint (RolloutWorker pid=23212) policies[policy_id] = Policy.from_state(policy_state) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\policy\policy.py", line 378, in from_state (RolloutWorker pid=23212) new_policy = actual_class( (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 67, in init (RolloutWorker pid=23212) self._initialize_loss_from_dummy_batch() (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\policy\policy.py", line 1405, in _initialize_loss_from_dummy_batch (RolloutWorker pid=23212) actions, state_outs, extra_outs = self.compute_actions_from_input_dict( (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 522, in compute_actions_from_input_dict (RolloutWorker pid=23212) return self._compute_action_helper( (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper (RolloutWorker pid=23212) return func(self, *a, k) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1162, in _compute_action_helper (RolloutWorker pid=23212) extra_fetches = self.extra_action_out( (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\policy\torch_mixins.py", line 166, in extra_action_out (RolloutWorker pid=23212) SampleBatch.VF_PREDS: model.value_function(), (RolloutWorker pid=23212) File "E:\hhmarl_2D\models\ac_models_hetero.py", line 500, in value_function (RolloutWorker pid=23212) x = self.inp1_val(self._v1) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl (RolloutWorker pid=23212) return self._call_impl(*args, *kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl (RolloutWorker pid=23212) return forward_call(args, kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\models\torch\misc.py", line 169, in forward (RolloutWorker pid=23212) return self._model(x) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl (RolloutWorker pid=23212) return self._call_impl(*args, kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl (RolloutWorker pid=23212) return forward_call(*args, *kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\container.py", line 215, in forward (RolloutWorker pid=23212) input = module(input) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl (RolloutWorker pid=23212) return self._call_impl(args, kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl (RolloutWorker pid=23212) return forward_call(*args, kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward (RolloutWorker pid=23212) return F.linear(input, self.weight, self.bias) (RolloutWorker pid=23212) RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x130 and 132x500) (RolloutWorker pid=23212) (RolloutWorker pid=23212) During handling of the above exception, another exception occurred: (RolloutWorker pid=23212) (RolloutWorker pid=23212) ray::RolloutWorker.init() (pid=23212, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x00000233EC6700A0>) (RolloutWorker pid=23212) File "python\ray_raylet.pyx", line 870, in ray._raylet.execute_task (RolloutWorker pid=23212) File "python\ray_raylet.pyx", line 921, in ray._raylet.execute_task (RolloutWorker pid=23212) File "python\ray_raylet.pyx", line 877, in ray._raylet.execute_task (RolloutWorker pid=23212) File "python\ray_raylet.pyx", line 881, in ray._raylet.execute_task (RolloutWorker pid=23212) File "python\ray_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray_private\function_manager.py", line 670, in actor_method_executor (RolloutWorker pid=23212) return method(__ray_actor, *args, *kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span (RolloutWorker pid=23212) return method(self, _args, _kwargs) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 609, in init (RolloutWorker pid=23212) self.env = env_creator(copy.deepcopy(self.env_context)) (RolloutWorker pid=23212) File "D:\anaconda3\envs\hhmarl2D\lib\site-packages\ray\rllib\env\utils.py", line 133, in _gym_env_creator (RolloutWorker pid=23212) env = env_descriptor(env_context) (RolloutWorker pid=23212) File "E:\hhmarl_2D\envs\env_hetero.py", line 86, in init (RolloutWorker pid=23212) self.sp_opp = self._get_policy() (RolloutWorker pid=23212) File "E:\hhmarl_2D\envs\env_hetero.py", line 307, in _get_policy (RolloutWorker pid=23212) raise NameError(f'Could not find L3 Fight Policy. Store in {check_path}') (RolloutWorker pid=23212) NameError: Could not find L3 Fight Policy. Store in E:\hhmarl_2D\results\Level3_fight_2vs2\checkpoint

YangRongtai commented 3 months ago

I have encountered the same issue and look forward to your reply.

ardian-selmonaj commented 3 months ago

Hello both, I'm sorry for the long wait to @ad56917783 I have overseen your message... Your configuration --agent_mode=escape --level=4 is not meant to be used like this. Escape should only be trained up to L3, because it is sufficient to learn to flee from opponents. However, if you want to change that, go ahead as:

envs > env_base.py > def _get_policies()

in this method you can modify which policies to use in curriculum learning / self-play. Please also update to the new version on git and read the description carefully. To have the fully trained model:

1) train escape, level 3 will be automatically set in config.py 2) train fight vom L1-L5 3) train hierarchical policy.

ad56917783 commented 3 months ago

Thank you for your reply. After reading the code, I got the right way and trained the models: Policy_fight from L1 to L5, Policy_escape L1-L3 and Policy_command. By the way, do you have the plan to switch to 3D environment?

YangRongtai commented 3 months ago

Thank you for your reply. After reading the code, I got the right way and trained the models: Policy_fight from L1 to L5, Policy_escape L1-L3 and Policy_command. By the way, do you have the plan to switch to 3D environment?

How are the parameters in config_hier.py set when training Policy_command?

ardian-selmonaj commented 3 months ago

Thank you for your reply. After reading the code, I got the right way and trained the models: Policy_fight from L1 to L5, Policy_escape L1-L3 and Policy_command. By the way, do you have the plan to switch to 3D environment?

How are the parameters in config_hier.py set when training Policy_command?

In the new version on git, there is no config_hier.py but only one config.py file for both trainings. There are not many changes in config for hierarchical training, mainly map size and number of agents.

3D env is in process of development, it will still take some time. Approximately September '24 there should be a first version.

YangRongtai commented 3 months ago

Thank you very much for your response, it is very helpful to me. Additionally, I have another small question.In the new version, when training the low-level policy, for example, at level=1, if agent_mode=fight, do we need to train for both opp_mode=fight and opp_mode=escape?

ardian-selmonaj commented 3 months ago

Thank you very much for your response, it is very helpful to me. Additionally, I have another small question.In the new version, when training the low-level policy, for example, at level=1, if agent_mode=fight, do we need to train for both opp_mode=fight and opp_mode=escape?

no this is not necessary. Because in Level 5, the opponent will be randomly assigned to fight/escape so the agent will train against fight and escape behavior simultaneously.

YangRongtai commented 3 months ago

Thank you very much for your response, it is very helpful to me. Additionally, I have another small question.In the new version, when training the low-level policy, for example, at level=1, if agent_mode=fight, do we need to train for both opp_mode=fight and opp_mode=escape?

no this is not necessary. Because in Level 5, the opponent will be randomly assigned to fight/escape so the agent will train against fight and escape behavior simultaneously.

I understand now, I will continue to understand your code. Thank you for your enthusiastic answers, and wish you good luck!