Shanghai-Digital-Brain-Laboratory / DB-Football

A Simple, Distributed and Asynchronous Multi-Agent Reinforcement Learning Framework for Google Research Football AI.
Other
95 stars 13 forks source link

ValueError in training #6

Open Jay-Vim-Lv opened 1 year ago

Jay-Vim-Lv commented 1 year ago

Hi, when i tried to replicate your code, i meet some issues. i can not find where the problem is or how to solve it, could you help me? my environment is builted the same as you recommend, the system is ubuntu 18.04 LTS. there are 2 gpus : 1080Ti & titan X in the code, I only modified the 'num_workers' and 'batch_size' in the YAML file to match my hardware. when i run python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml,It generated the following error message: ` (/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF) lxd@lxd-T630:/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football$ python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml [2023-09-28 09:18:34,036][WARNING] No active cluster detected, will create local ray instance. [2023-09-28 09:18:44,991][WARNING] ============== Cluster Info ============== {'node_ip_address': '192.168.1.109', 'raylet_ip_address': '192.168.1.109', 'redis_address': '192.168.1.109:6379', 'object_store_address': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469', 'metrics_export_port': 55494, 'node_id': 'a8211a7e16deb107246a6dfd4b68c7d43f1a31ddb9fdba7c482c3b64'} [2023-09-28 09:18:44,993][WARNING] * cluster resources: {'accelerator_type:G': 1.0, 'GPU': 2.0, 'object_store_memory': 17054784307.0, 'memory': 34109568615.0, 'node:192.168.1.109': 1.0, 'CPU': 48.0} [2023-09-28 09:18:44,993][WARNING] this worker ip: 192.168.1.109 [2023-09-28 09:18:44,994][WARNING] Automatically set master ip to local ip address: 192.168.1.109 [2023-09-28 09:18:46,480][INFO] AgentManager initialized [2023-09-28 09:18:46,514][WARNING] use meta solver type: nash [2023-09-28 09:18:46,991][INFO] PBTRunner psro initialized [2023-09-28 09:18:46,991][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1 [2023-09-28 09:18:46,995][WARNING] use model type: gr_football.built_in_11 (pid=47592) [2023-09-28 09:18:49,787][INFO] DataServer initialized (pid=47595) [2023-09-28 09:18:49,798][INFO] PolicyServer initialized [2023-09-28 09:18:50,411][INFO] Load initial policy built_in_11 from light_malib/trained_models/gr_football/11_vs_11/built_in [2023-09-28 09:18:50,426][WARNING] use model type: gr_football.basic_11 [2023-09-28 09:18:50,479][WARNING] agent_0: agent_0-default-0 is initialized from random [2023-09-28 09:18:50,479][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 09:18:50,523][WARNING] after initialization:

policy_ids: ['built_in_11', 'agent_0-default-0'] populations:

policy_ids:['built_in_11', 'agent_0-default-0']

policy_ids:['built_in_11', 'agent_0-default-0'] [2023-09-28 09:18:50,524][WARNING] Evaluation rollouts (num: 50) for 3 policy combinations: [{'agent_0': {'built_in_11': 1.0}, 'agent_1': {'built_in_11': 1.0}}, {'agent_0': {'built_in_11': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}] (pid=47611) [2023-09-28 09:18:51,072][INFO] TrainingManager initialized (pid=47610) [2023-09-28 09:18:51,149][INFO] RolloutManager initialized (pid=47606) [2023-09-28 09:19:02,415][INFO] DataPrefetcher initialized (pid=47599) [2023-09-28 09:19:02,593][INFO] trainer_1 (local rank: 1) initialized (pid=47609) [2023-09-28 09:19:02,603][INFO] trainer_0 (local rank: 0) initialized Elo = dict_items([('built_in_11', 1015.631846603239), ('agent_0-default-0', 984.368153396761)]) [2023-09-28 09:30:57,920][INFO] policy_data: [('built_in_11', 'built_in_11'):{'payoff': 5.551115123125783e-17, 'score': 0.5, 'win': 0.28, 'lose': 0.28, 'my_goal': 0.43, 'goal_diff': 0.0}],[('built_in_11', 'agent_0-default-0'):{'payoff': 1.0, 'score': 1.0, 'win': 1.0, 'lose': 0.0, 'my_goal': 3.883116883116883, 'goal_diff': 3.883116883116883}],[('agent_0-default-0', 'built_in_11'):{'payoff': -1.0, 'score': 0.0, 'win': 0.0, 'lose': 1.0, 'my_goal': 0.0, 'goal_diff': -3.883116883116883}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 0.0, 'score': 0.5, 'win': 0.25, 'lose': 0.25, 'my_goal': 0.42, 'goal_diff': 0.0}], (pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:59: UserWarning: Starting a Matplotlib GUI outside of the main thread will likely fail. (pid=47605) fig = plt.figure() (pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:63: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=47605) ax.set_xticklabels([""] + xpid, rotation=90) (pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:64: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=47605) ax.set_yticklabels([""] + ypid) [2023-09-28 09:30:58,519][INFO] payoff table: +-------------+---------------+-------------+ | | built_in_11 | default-0 | +=============+===============+=============+ | built_in_11 | +0 | +100 | +-------------+---------------+-------------+ | default-0 | -100 | +0 | +-------------+---------------+-------------+ [2023-09-28 09:30:58,520][INFO] default-0's top 10 worst opponents are: +-------------+----------+ | policy_id | payoff | +=============+==========+ | built_in_11 | -100.00 | +-------------+----------+ | default-0 | +0.00 | +-------------+----------+ [2023-09-28 09:31:10,202][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0 [2023-09-28 09:31:10,203][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 09:31:10,223][WARNING] ********** Generation[0] Agent[agent_0] START ********** [2023-09-28 09:31:10,223][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={}) (pid=47592) [2023-09-28 09:31:10,243][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 8}} (pid=47592) [2023-09-28 09:31:10,248][INFO] DataServer created data table agent_0-default-1 (pid=47610) [2023-09-28 09:31:10,281][INFO] Rollout 1 (pid=47599) [2023-09-28 09:31:10,431][INFO] local_rank: 1 cuda_visible_devices:1 (pid=47609) [2023-09-28 09:31:10,405][INFO] local_rank: 0 cuda_visible_devices:0 (pid=47599) [2023-09-28 09:31:12,242][WARNING] trainer_1 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}}) (pid=47609) [2023-09-28 09:31:12,229][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}}) (pid=47609) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.) (pid=47609) value = torch.FloatTensor(value) (pid=47599) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.) (pid=47599) value = torch.FloatTensor(value) (pid=47610) [2023-09-28 09:32:56,022][WARNING] save the best model(average reward:-5092.5,average win:0.0) (pid=47610) [2023-09-28 09:32:56,081][INFO] Rollout 2 (pid=47610) [2023-09-28 09:34:40,549][WARNING] save the best model(average reward:-3465.0,average win:0.0) (pid=47610) [2023-09-28 09:34:40,601][INFO] Rollout 3 (pid=47611) 2023-09-28 09:35:41,233 ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::DistributedTrainer.optimize() (pid=47599, ip=192.168.1.109, repr=) (pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize (pid=47611) training_info = self.trainer.optimize(batch) (pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize (pid=47611) tmp_opt_result = self.loss(mini_batch) (pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in __call__ (pid=47611) return tensor_cast( (pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap (pid=47611) rets = func(*new_args, **kwargs) (pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute (pid=47611) values, action_log_probs, dist_entropy = self._evaluate_actions( (pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions (pid=47611) dist = torch.distributions.Categorical(logits=logits) (pid=47611) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in __init__ (pid=47611) super().__init__(batch_shape, validate_args=validate_args) (pid=47611) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in __init__ (pid=47611) raise ValueError( (pid=47611) ValueError: Expected parameter logits (Tensor of shape (40000, 19)) of distribution Categorical(logits: torch.Size([40000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: (pid=47611) tensor([[nan, nan, nan, ..., nan, nan, nan], (pid=47611) [nan, nan, nan, ..., nan, nan, nan], (pid=47611) [nan, nan, nan, ..., nan, nan, nan], (pid=47611) ..., (pid=47611) [nan, nan, nan, ..., nan, nan, nan], (pid=47611) [nan, nan, nan, ..., nan, nan, nan], (pid=47611) [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', (pid=47611) grad_fn=) (pid=47610) [2023-09-28 09:35:41,283][INFO] Saving model agent_0 agent_0-default-1 3 to /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/./logs/gr_football/10_vs_10_psro/2023-09-28-09-18-44/agent_0/agent_0-default-1/3 Traceback (most recent call last): File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 126, in main() File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 114, in main runner.run() File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/framework/pbt_runner.py", line 106, in run ray.get(training_task_ref) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/worker.py", line 1625, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=47611, ip=192.168.1.109, repr=) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/decorator.py", line 22, in wrapper return func(self, *args, **kwargs) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/training_manager.py", line 146, in train statistics_list = ray.get( ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=47609, ip=192.168.1.109, repr=) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize training_info = self.trainer.optimize(batch) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize tmp_opt_result = self.loss(mini_batch) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in __call__ return tensor_cast( File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap rets = func(*new_args, **kwargs) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute values, action_log_probs, dist_entropy = self._evaluate_actions( File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions dist = torch.distributions.Categorical(logits=logits) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in __init__ super().__init__(batch_shape, validate_args=validate_args) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in __init__ raise ValueError( ValueError: Expected parameter logits (Tensor of shape (40000, 19)) of distribution Categorical(logits: torch.Size([40000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=) ` i am not sure if it was a hardware issure, so i tried training with just one TITAN X, but it still generated the following error message: ` (/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF) lxd@lxd-T630:/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football$ python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml [2023-09-28 09:55:44,004][WARNING] No active cluster detected, will create local ray instance. [2023-09-28 09:55:52,920][WARNING] ============== Cluster Info ============== {'node_ip_address': '192.168.1.109', 'raylet_ip_address': '192.168.1.109', 'redis_address': '192.168.1.109:6379', 'object_store_address': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830', 'metrics_export_port': 58593, 'node_id': '0b4c8573ddd5462ff763c6db9c7b0cd22dbe01d81d14b7398a7e5ece'} [2023-09-28 09:55:52,923][WARNING] * cluster resources: {'object_store_memory': 17818028851.0, 'GPU': 2.0, 'accelerator_type:G': 1.0, 'node:192.168.1.109': 1.0, 'memory': 35636057703.0, 'CPU': 48.0} [2023-09-28 09:55:52,923][WARNING] this worker ip: 192.168.1.109 [2023-09-28 09:55:52,924][WARNING] Automatically set master ip to local ip address: 192.168.1.109 [2023-09-28 09:55:54,333][INFO] AgentManager initialized [2023-09-28 09:55:54,366][WARNING] use meta solver type: nash [2023-09-28 09:55:54,844][INFO] PBTRunner psro initialized [2023-09-28 09:55:54,845][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1 [2023-09-28 09:55:54,849][WARNING] use model type: gr_football.built_in_11 (pid=37950) [2023-09-28 09:55:57,624][INFO] PolicyServer initialized (pid=37956) [2023-09-28 09:55:57,675][INFO] DataServer initialized [2023-09-28 09:55:58,195][INFO] Load initial policy built_in_11 from light_malib/trained_models/gr_football/11_vs_11/built_in [2023-09-28 09:55:58,210][WARNING] use model type: gr_football.basic_11 [2023-09-28 09:55:58,257][WARNING] agent_0: agent_0-default-0 is initialized from random [2023-09-28 09:55:58,257][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 09:55:58,286][WARNING] after initialization: policy_ids: ['built_in_11', 'agent_0-default-0'] populations:

policy_ids:['built_in_11', 'agent_0-default-0']

policy_ids:['built_in_11', 'agent_0-default-0'] [2023-09-28 09:55:58,287][WARNING] Evaluation rollouts (num: 50) for 3 policy combinations: [{'agent_0': {'built_in_11': 1.0}, 'agent_1': {'built_in_11': 1.0}}, {'agent_0': {'built_in_11': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}] (pid=37940) [2023-09-28 09:55:58,899][INFO] TrainingManager initialized (pid=37954) [2023-09-28 09:55:58,891][INFO] RolloutManager initialized (pid=37970) [2023-09-28 09:56:08,109][INFO] trainer_0 (local rank: 0) initialized (pid=37957) [2023-09-28 09:56:08,385][INFO] DataPrefetcher initialized Elo = dict_items([('built_in_11', 1015.3241542955467), ('agent_0-default-0', 984.6758457044533)]) [2023-09-28 10:07:43,192][INFO] policy_data: [('built_in_11', 'built_in_11'):{'payoff': 0.0, 'score': 0.5, 'win': 0.27, 'lose': 0.27, 'my_goal': 0.5, 'goal_diff': 0.0}],[('built_in_11', 'agent_0-default-0'):{'payoff': 0.9807692307692307, 'score': 0.9903846153846154, 'win': 0.9807692307692308, 'lose': 0.0, 'my_goal': 4.035256410256411, 'goal_diff': 4.035256410256411}],[('agent_0-default-0', 'built_in_11'):{'payoff': -0.9807692307692308, 'score': 0.009615384615384616, 'win': 0.0, 'lose': 0.9807692307692308, 'my_goal': 0.0, 'goal_diff': -4.035256410256411}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 5.551115123125783e-17, 'score': 0.5, 'win': 0.29000000000000004, 'lose': 0.29000000000000004, 'my_goal': 0.44, 'goal_diff': 0.0}], (pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:59: UserWarning: Starting a Matplotlib GUI outside of the main thread will likely fail. (pid=37960) fig = plt.figure() (pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:63: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=37960) ax.set_xticklabels([""] + xpid, rotation=90) (pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:64: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=37960) ax.set_yticklabels([""] + ypid) [2023-09-28 10:07:43,815][INFO] payoff table: +-------------+---------------+-------------+ | | built_in_11 | default-0 | +=============+===============+=============+ | built_in_11 | +0 | +98 | +-------------+---------------+-------------+ | default-0 | -98 | +0 | +-------------+---------------+-------------+ [2023-09-28 10:07:43,816][INFO] default-0's top 10 worst opponents are: +-------------+----------+ | policy_id | payoff | +=============+==========+ | built_in_11 | -98.08 | +-------------+----------+ | default-0 | +0.00 | +-------------+----------+ [2023-09-28 10:07:56,080][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0 [2023-09-28 10:07:56,081][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 10:07:56,107][WARNING] ********** Generation[0] Agent[agent_0] START ********** [2023-09-28 10:07:56,107][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={}) (pid=37956) [2023-09-28 10:07:56,125][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 8}} (pid=37956) [2023-09-28 10:07:56,129][INFO] DataServer created data table agent_0-default-1 (pid=37954) [2023-09-28 10:07:56,159][INFO] Rollout 1 (pid=37970) [2023-09-28 10:07:56,375][INFO] local_rank: 0 cuda_visible_devices:0 (pid=37970) [2023-09-28 10:07:57,988][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}}) (pid=37970) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.) (pid=37970) value = torch.FloatTensor(value) (pid=37954) [2023-09-28 10:09:29,829][WARNING] save the best model(average reward:-5103.75,average win:0.0) (pid=37954) [2023-09-28 10:09:29,896][INFO] Rollout 2 (pid=37954) [2023-09-28 10:11:04,900][WARNING] save the best model(average reward:-3472.5,average win:0.0) (pid=37954) [2023-09-28 10:11:04,950][INFO] Rollout 3 (pid=37954) [2023-09-28 10:12:38,904][WARNING] save the best model(average reward:-2661.875,average win:0.0) (pid=37954) [2023-09-28 10:12:38,938][INFO] Rollout 4 (pid=37954) [2023-09-28 10:14:12,399][WARNING] save the best model(average reward:-2166.5,average win:0.0) (pid=37954) [2023-09-28 10:14:12,440][INFO] Rollout 5 (pid=37960) Exception ignored in: (pid=37960) Traceback (most recent call last): (pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 4016, in __del__ (pid=37960) self.tk.call('image', 'delete', self.name) (pid=37960) RuntimeError: main thread is not in main loop (pid=37960) Exception ignored in: (pid=37960) Traceback (most recent call last): (pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__ (pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)): (pid=37960) RuntimeError: main thread is not in main loop (pid=37970) [2023-09-28 10:15:54,407][WARNING] queue is full. May have bugs in training. (pid=37960) Exception ignored in: (pid=37960) Traceback (most recent call last): (pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__ (pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)): (pid=37960) RuntimeError: main thread is not in main loop (pid=37960) Exception ignored in: (pid=37960) Traceback (most recent call last): (pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__ (pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)): (pid=37960) RuntimeError: main thread is not in main loop (pid=37960) Exception ignored in: (pid=37960) Traceback (most recent call last): (pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__ (pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)): (pid=37960) RuntimeError: main thread is not in main loop (pid=37954) [2023-09-28 10:15:57,987][WARNING] save the best model(average reward:-1838.75,average win:0.0) (pid=37954) [2023-09-28 10:15:58,037][INFO] Rollout 6 (pid=37954) [2023-09-28 10:17:20,960][WARNING] save the best model(average reward:-1609.642857142857,average win:0.0) (pid=37954) [2023-09-28 10:17:21,004][INFO] Rollout 7 (pid=37954) [2023-09-28 10:18:54,245][WARNING] save the best model(average reward:-1433.125,average win:0.0) (pid=37954) [2023-09-28 10:18:54,289][INFO] Rollout 8 (pid=37954) [2023-09-28 10:20:04,518][INFO] Saving model agent_0 agent_0-default-1 8 to /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/./logs/gr_football/10_vs_10_psro/2023-09-28-09-55-52/agent_0/agent_0-default-1/8 Traceback (most recent call last): File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 126, in main() File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 114, in main runner.run() File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/framework/pbt_runner.py", line 106, in run ray.get(training_task_ref) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/worker.py", line 1625, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=37940, ip=192.168.1.109, repr=) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/decorator.py", line 22, in wrapper return func(self, *args, **kwargs) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/training_manager.py", line 146, in train statistics_list = ray.get( ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=37970, ip=192.168.1.109, repr=) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize training_info = self.trainer.optimize(batch) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize tmp_opt_result = self.loss(mini_batch) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in __call__ return tensor_cast( File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap rets = func(*new_args, **kwargs) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute values, action_log_probs, dist_entropy = self._evaluate_actions( File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions dist = torch.distributions.Categorical(logits=logits) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in __init__ super().__init__(batch_shape, validate_args=validate_args) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in __init__ raise ValueError( ValueError: Expected parameter logits (Tensor of shape (80000, 19)) of distribution Categorical(logits: torch.Size([80000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=) ` do you know why this happened?

YanSong97 commented 1 year ago

Hi Jay:

What worker_num and batch size you used? Have you tried difference values?

Jay-Vim-Lv commented 1 year ago

num_workers=20 or 30 batch_size=8 or 32 or else nothing else has been changed

qyh-stbz commented 1 year ago

I'm also getting the same error

ZHQ-air commented 12 months ago

Hi, I have also encountered the similar problem as Jay-Vim-Lv, and do you konw how to solve this problem. The error information is as follows(错误输出信息如下所示):

(light-malib) zhq@zhq-Taitan:~/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1$ python main_pbt.py --config light_malib/expr/gr_football/expr_5_vs_5_psro.yaml
[2023-11-07 16:02:59,921][WARNING] No active cluster detected, will create local ray instance.
[2023-11-07 16:03:01,223][WARNING] ============== Cluster Info ==============
{'node_ip_address': '10.1.80.147', 'raylet_ip_address': '10.1.80.147', 'redis_address': '10.1.80.147:6379', 'object_store_address': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030', 'metrics_export_port': 60763, 'node_id': '2841381510c4b1ba545ad7dcb7719998de2b9228147bcb839aa9b7d0'}
[2023-11-07 16:03:01,227][WARNING] * cluster resources:
{'accelerator_type:G': 1.0, 'memory': 37538726708.0, 'GPU': 1.0, 'CPU': 12.0, 'object_store_memory': 18769363353.0, 'node:10.1.80.147': 1.0}
[2023-11-07 16:03:01,228][WARNING] this worker ip: 10.1.80.147
[2023-11-07 16:03:01,232][WARNING] Automatically set master ip to local ip address: 10.1.80.147
[2023-11-07 16:03:01,747][INFO] AgentManager initialized
[2023-11-07 16:03:01,754][WARNING] use meta solver type: nash
[2023-11-07 16:03:01,839][INFO] PBTRunner psro initialized
[2023-11-07 16:03:01,839][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1
[2023-11-07 16:03:01,840][WARNING] use model type: gr_football.built_in_5
(pid=687111) [2023-11-07 16:03:02,545][INFO] DataServer initialized
(pid=687117) [2023-11-07 16:03:02,596][INFO] PolicyServer initialized
[2023-11-07 16:03:02,694][INFO] Load initial policy built_in_5 from light_malib/trained_models/gr_football/5_vs_5/built_in
[2023-11-07 16:03:02,696][WARNING] use model type: gr_football.basic_5
[2023-11-07 16:03:02,704][WARNING] agent_0: agent_0-default-0 is initialized from random
[2023-11-07 16:03:02,704][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:03:02,716][WARNING] after initialization:

<A agent_0>
policy_ids:
['built_in_5', 'agent_0-default-0']
populations:
<P __all__> policy_ids:['built_in_5', 'agent_0-default-0']<P default> policy_ids:['built_in_5', 'agent_0-default-0']

[2023-11-07 16:03:02,716][WARNING] Evaluation rollouts (num: 5) for 3 policy combinations: [{'agent_0': {'built_in_5': 1.0}, 'agent_1': {'built_in_5': 1.0}}, {'agent_0': {'built_in_5': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}]
(pid=687118) [2023-11-07 16:03:02,857][INFO] RolloutManager initialized
(pid=687114) [2023-11-07 16:03:03,011][INFO] TrainingManager initialized
(pid=687108) [2023-11-07 16:03:04,067][INFO] trainer_0 (local rank: 0) initialized
(pid=687107) [2023-11-07 16:03:04,142][INFO] DataPrefetcher initialized
Elo = dict_items([('agent_0-default-0', 984.368153396761), ('built_in_5', 1015.631846603239)])
[2023-11-07 16:04:22,723][INFO] policy_data: [('built_in_5', 'built_in_5'):{'payoff': 0.0, 'score': 0.5, 'win': 0.1, 'lose': 0.1, 'my_goal': 0.2, 'goal_diff': 0.0}],[('built_in_5', 'agent_0-default-0'):{'payoff': 1.0, 'score': 1.0, 'win': 1.0, 'lose': 0.0, 'my_goal': 1.5, 'goal_diff': 1.5}],[('agent_0-default-0', 'built_in_5'):{'payoff': -1.0, 'score': 0.0, 'win': 0.0, 'lose': 1.0, 'my_goal': 0.0, 'goal_diff': -1.5}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 0.0, 'score': 0.5, 'win': 0.4, 'lose': 0.4, 'my_goal': 0.5, 'goal_diff': 0.0}],
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:66: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113)   ax.set_xticklabels([""] + xpid, rotation=90)
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:67: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113)   ax.set_yticklabels([""] + ypid)
[2023-11-07 16:04:22,839][INFO] payoff table:
+------------+--------------+-------------+
|            |   built_in_5 |   default-0 |
+============+==============+=============+
| built_in_5 |           +0 |        +100 |
+------------+--------------+-------------+
| default-0  |         -100 |          +0 |
+------------+--------------+-------------+
[2023-11-07 16:04:22,839][INFO] default-0's top 10 worst opponents are:
+-------------+----------+
| policy_id   |   payoff |
+=============+==========+
| built_in_5  |  -100.00 |
+-------------+----------+
| default-0   |    +0.00 |
+-------------+----------+
[2023-11-07 16:04:28,836][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0
[2023-11-07 16:04:28,836][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:04:28,843][WARNING] ********** Generation[0] Agent[agent_0] START **********
[2023-11-07 16:04:28,843][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7fe0f9dfe970>, kwargs={})
(pid=687111) [2023-11-07 16:04:28,870][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 24}}
(pid=687111) [2023-11-07 16:04:28,872][INFO] DataServer created data table agent_0-default-1
(pid=687118) [2023-11-07 16:04:28,879][INFO] Rollout 1
(pid=687108) [2023-11-07 16:04:28,890][INFO] local_rank: 0 cuda_visible_devices:0
(pid=687108) [2023-11-07 16:04:30,187][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7f135e001dc0>, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:10.1.80.147': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=687118) [2023-11-07 16:04:55,473][WARNING] save the best model(average reward:-5020.0,average win:0.0)
(pid=687118) [2023-11-07 16:04:55,495][INFO] Rollout 2
(pid=687118) [2023-11-07 16:05:21,904][WARNING] save the best model(average reward:-3370.6666666666665,average win:0.0)
(pid=687118) [2023-11-07 16:05:21,925][INFO] Rollout 3
(pid=687118) [2023-11-07 16:05:53,902][WARNING] save the best model(average reward:-2539.0,average win:0.0)
(pid=687118) [2023-11-07 16:05:53,923][INFO] Rollout 4
(pid=687118) [2023-11-07 16:06:24,837][WARNING] save the best model(average reward:-2048.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:24,856][INFO] Rollout 5
(pid=687118) [2023-11-07 16:06:55,875][WARNING] save the best model(average reward:-1714.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:55,894][INFO] Rollout 6
(pid=687118) [2023-11-07 16:07:26,569][WARNING] save the best model(average reward:-1477.142857142857,average win:0.0)
(pid=687118) [2023-11-07 16:07:26,605][INFO] Rollout 7
(pid=687118) [2023-11-07 16:07:57,201][WARNING] save the best model(average reward:-1300.5,average win:0.0)
(pid=687118) [2023-11-07 16:07:57,220][INFO] Rollout 8
(pid=687118) [2023-11-07 16:08:28,308][WARNING] save the best model(average reward:-1162.6666666666667,average win:0.0)
(pid=687118) [2023-11-07 16:08:28,330][INFO] Rollout 9
(pid=687118) [2023-11-07 16:08:58,670][WARNING] save the best model(average reward:-1054.0,average win:0.0)
(pid=687118) [2023-11-07 16:08:58,688][INFO] Rollout 10
(pid=687118) [2023-11-07 16:09:30,212][WARNING] save the best model(average reward:-960.7272727272727,average win:0.0)
(pid=687118) [2023-11-07 16:09:30,234][INFO] Rollout 11
(pid=687118) [2023-11-07 16:10:00,340][WARNING] save the best model(average reward:-883.6666666666666,average win:0.0)
(pid=687118) [2023-11-07 16:10:00,362][INFO] Rollout 12
(pid=687118) [2023-11-07 16:10:32,308][WARNING] save the best model(average reward:-818.1538461538462,average win:0.0)
(pid=687118) [2023-11-07 16:10:32,333][INFO] Rollout 13
(pid=687118) [2023-11-07 16:11:04,471][WARNING] save the best model(average reward:-762.2857142857143,average win:0.0)
(pid=687118) [2023-11-07 16:11:04,495][INFO] Rollout 14
(pid=687118) [2023-11-07 16:11:34,548][WARNING] save the best model(average reward:-713.6,average win:0.0)
(pid=687118) [2023-11-07 16:11:34,572][INFO] Rollout 15
(pid=687118) [2023-11-07 16:12:05,414][WARNING] save the best model(average reward:-672.25,average win:0.0)
(pid=687118) [2023-11-07 16:12:05,435][INFO] Rollout 16
(pid=687118) [2023-11-07 16:12:35,794][WARNING] save the best model(average reward:-635.5294117647059,average win:0.0)
(pid=687118) [2023-11-07 16:12:35,812][INFO] Rollout 17
(pid=687118) [2023-11-07 16:13:05,773][WARNING] save the best model(average reward:-602.8888888888889,average win:0.0)
(pid=687118) [2023-11-07 16:13:05,796][INFO] Rollout 18
(pid=687118) [2023-11-07 16:13:36,861][WARNING] save the best model(average reward:-574.3157894736842,average win:0.0)
(pid=687118) [2023-11-07 16:13:36,877][INFO] Rollout 19
(pid=687118) [2023-11-07 16:14:07,636][WARNING] save the best model(average reward:-547.4,average win:0.0)
(pid=687118) [2023-11-07 16:14:07,653][INFO] Rollout 20
(pid=687118) [2023-11-07 16:14:38,884][WARNING] save the best model(average reward:-48.8,average win:0.0)
(pid=687118) [2023-11-07 16:14:38,905][INFO] Rollout 21
(pid=687118) [2023-11-07 16:15:08,913][WARNING] save the best model(average reward:-48.6,average win:0.0)
(pid=687118) [2023-11-07 16:15:08,931][INFO] Rollout 22
(pid=687118) [2023-11-07 16:15:38,800][WARNING] save the best model(average reward:-48.2,average win:0.0)
(pid=687118) [2023-11-07 16:15:38,820][INFO] Rollout 23
(pid=687118) [2023-11-07 16:16:07,657][INFO] Rollout 24
(pid=687118) [2023-11-07 16:16:38,027][WARNING] save the best model(average reward:-47.0,average win:0.0)
(pid=687118) [2023-11-07 16:16:38,044][INFO] Rollout 25
(pid=687118) [2023-11-07 16:17:07,691][INFO] Rollout 26
.
.
.(pid=687118) [2023-11-07 18:14:16,224][INFO] Rollout 264
(pid=687118) [2023-11-07 18:14:47,594][INFO] Rollout 265
Traceback (most recent call last):
  File "main_pbt.py", line 126, in <module>
    main()
  File "main_pbt.py", line 114, in main
    runner.run()
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/framework/pbt_runner.py", line 111, in run
    ray.get(training_task_ref)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/worker.py", line 1625, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=687114, ip=10.1.80.147, repr=<light_malib.training.training_manager.TrainingManager object at 0x7fd1fa036340>)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/decorator.py", line 22, in wrapper
    return func(self, *args, **kwargs)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/training_manager.py", line 146, in train
    statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=687108, ip=10.1.80.147, repr=<light_malib.training.distributed_trainer.DistributedTrainer object at 0x7f135e001b80>)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/distributed_trainer.py", line 200, in optimize
    training_info = self.trainer.optimize(batch)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
    tmp_opt_result = self.loss(mini_batch)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/common/loss_func.py", line 70, in __call__
    return tensor_cast(
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/general.py", line 110, in wrap
    rets = func(*new_args, **kwargs)
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
    values, action_log_probs, dist_entropy = self._evaluate_actions(
  File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
    dist = torch.distributions.Categorical(logits=logits)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (96000, 19)) of distribution Categorical(logits: torch.Size([96000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<SubBackward0>)
YanSong97 commented 12 months ago

Hi, I have just uploaded a demo config. Feel free to try it out.

Also, my local pytorch version is at 1.13.0 and I cannot reproduce this error. Which pytorch version are you using?

ZHQ-air commented 12 months ago

Thank you very much for your response. This error does not happen again when I used the expr_10_vs_10_psro.yaml, where I set the batch_size=100 and num_works=5