Open Jay-Vim-Lv opened 1 year ago
Hi Jay:
What worker_num and batch size you used? Have you tried difference values?
num_workers=20 or 30 batch_size=8 or 32 or else nothing else has been changed
I'm also getting the same error
Hi, I have also encountered the similar problem as Jay-Vim-Lv, and do you konw how to solve this problem. The error information is as follows(错误输出信息如下所示):
(light-malib) zhq@zhq-Taitan:~/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1$ python main_pbt.py --config light_malib/expr/gr_football/expr_5_vs_5_psro.yaml
[2023-11-07 16:02:59,921][WARNING] No active cluster detected, will create local ray instance.
[2023-11-07 16:03:01,223][WARNING] ============== Cluster Info ==============
{'node_ip_address': '10.1.80.147', 'raylet_ip_address': '10.1.80.147', 'redis_address': '10.1.80.147:6379', 'object_store_address': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-11-07_16-02-59_922304_687030', 'metrics_export_port': 60763, 'node_id': '2841381510c4b1ba545ad7dcb7719998de2b9228147bcb839aa9b7d0'}
[2023-11-07 16:03:01,227][WARNING] * cluster resources:
{'accelerator_type:G': 1.0, 'memory': 37538726708.0, 'GPU': 1.0, 'CPU': 12.0, 'object_store_memory': 18769363353.0, 'node:10.1.80.147': 1.0}
[2023-11-07 16:03:01,228][WARNING] this worker ip: 10.1.80.147
[2023-11-07 16:03:01,232][WARNING] Automatically set master ip to local ip address: 10.1.80.147
[2023-11-07 16:03:01,747][INFO] AgentManager initialized
[2023-11-07 16:03:01,754][WARNING] use meta solver type: nash
[2023-11-07 16:03:01,839][INFO] PBTRunner psro initialized
[2023-11-07 16:03:01,839][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1
[2023-11-07 16:03:01,840][WARNING] use model type: gr_football.built_in_5
(pid=687111) [2023-11-07 16:03:02,545][INFO] DataServer initialized
(pid=687117) [2023-11-07 16:03:02,596][INFO] PolicyServer initialized
[2023-11-07 16:03:02,694][INFO] Load initial policy built_in_5 from light_malib/trained_models/gr_football/5_vs_5/built_in
[2023-11-07 16:03:02,696][WARNING] use model type: gr_football.basic_5
[2023-11-07 16:03:02,704][WARNING] agent_0: agent_0-default-0 is initialized from random
[2023-11-07 16:03:02,704][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:03:02,716][WARNING] after initialization:
<A agent_0>
policy_ids:
['built_in_5', 'agent_0-default-0']
populations:
<P __all__> policy_ids:['built_in_5', 'agent_0-default-0']<P default> policy_ids:['built_in_5', 'agent_0-default-0']
[2023-11-07 16:03:02,716][WARNING] Evaluation rollouts (num: 5) for 3 policy combinations: [{'agent_0': {'built_in_5': 1.0}, 'agent_1': {'built_in_5': 1.0}}, {'agent_0': {'built_in_5': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}]
(pid=687118) [2023-11-07 16:03:02,857][INFO] RolloutManager initialized
(pid=687114) [2023-11-07 16:03:03,011][INFO] TrainingManager initialized
(pid=687108) [2023-11-07 16:03:04,067][INFO] trainer_0 (local rank: 0) initialized
(pid=687107) [2023-11-07 16:03:04,142][INFO] DataPrefetcher initialized
Elo = dict_items([('agent_0-default-0', 984.368153396761), ('built_in_5', 1015.631846603239)])
[2023-11-07 16:04:22,723][INFO] policy_data: [('built_in_5', 'built_in_5'):{'payoff': 0.0, 'score': 0.5, 'win': 0.1, 'lose': 0.1, 'my_goal': 0.2, 'goal_diff': 0.0}],[('built_in_5', 'agent_0-default-0'):{'payoff': 1.0, 'score': 1.0, 'win': 1.0, 'lose': 0.0, 'my_goal': 1.5, 'goal_diff': 1.5}],[('agent_0-default-0', 'built_in_5'):{'payoff': -1.0, 'score': 0.0, 'win': 0.0, 'lose': 1.0, 'my_goal': 0.0, 'goal_diff': -1.5}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 0.0, 'score': 0.5, 'win': 0.4, 'lose': 0.4, 'my_goal': 0.5, 'goal_diff': 0.0}],
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:66: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113) ax.set_xticklabels([""] + xpid, rotation=90)
(pid=687113) /home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/monitor/monitor.py:67: UserWarning: FixedFormatter should only be used together with FixedLocator
(pid=687113) ax.set_yticklabels([""] + ypid)
[2023-11-07 16:04:22,839][INFO] payoff table:
+------------+--------------+-------------+
| | built_in_5 | default-0 |
+============+==============+=============+
| built_in_5 | +0 | +100 |
+------------+--------------+-------------+
| default-0 | -100 | +0 |
+------------+--------------+-------------+
[2023-11-07 16:04:22,839][INFO] default-0's top 10 worst opponents are:
+-------------+----------+
| policy_id | payoff |
+=============+==========+
| built_in_5 | -100.00 |
+-------------+----------+
| default-0 | +0.00 |
+-------------+----------+
[2023-11-07 16:04:28,836][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0
[2023-11-07 16:04:28,836][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-11-07 16:04:28,843][WARNING] ********** Generation[0] Agent[agent_0] START **********
[2023-11-07 16:04:28,843][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7fe0f9dfe970>, kwargs={})
(pid=687111) [2023-11-07 16:04:28,870][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 24}}
(pid=687111) [2023-11-07 16:04:28,872][INFO] DataServer created data table agent_0-default-1
(pid=687118) [2023-11-07 16:04:28,879][INFO] Rollout 1
(pid=687108) [2023-11-07 16:04:28,890][INFO] local_rank: 0 cuda_visible_devices:0
(pid=687108) [2023-11-07 16:04:30,187][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_5', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=<light_malib.framework.scheduler.stopper.common.win_rate_stopper.WinRateStopper object at 0x7f135e001dc0>, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:10.1.80.147': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=687118) [2023-11-07 16:04:55,473][WARNING] save the best model(average reward:-5020.0,average win:0.0)
(pid=687118) [2023-11-07 16:04:55,495][INFO] Rollout 2
(pid=687118) [2023-11-07 16:05:21,904][WARNING] save the best model(average reward:-3370.6666666666665,average win:0.0)
(pid=687118) [2023-11-07 16:05:21,925][INFO] Rollout 3
(pid=687118) [2023-11-07 16:05:53,902][WARNING] save the best model(average reward:-2539.0,average win:0.0)
(pid=687118) [2023-11-07 16:05:53,923][INFO] Rollout 4
(pid=687118) [2023-11-07 16:06:24,837][WARNING] save the best model(average reward:-2048.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:24,856][INFO] Rollout 5
(pid=687118) [2023-11-07 16:06:55,875][WARNING] save the best model(average reward:-1714.0,average win:0.0)
(pid=687118) [2023-11-07 16:06:55,894][INFO] Rollout 6
(pid=687118) [2023-11-07 16:07:26,569][WARNING] save the best model(average reward:-1477.142857142857,average win:0.0)
(pid=687118) [2023-11-07 16:07:26,605][INFO] Rollout 7
(pid=687118) [2023-11-07 16:07:57,201][WARNING] save the best model(average reward:-1300.5,average win:0.0)
(pid=687118) [2023-11-07 16:07:57,220][INFO] Rollout 8
(pid=687118) [2023-11-07 16:08:28,308][WARNING] save the best model(average reward:-1162.6666666666667,average win:0.0)
(pid=687118) [2023-11-07 16:08:28,330][INFO] Rollout 9
(pid=687118) [2023-11-07 16:08:58,670][WARNING] save the best model(average reward:-1054.0,average win:0.0)
(pid=687118) [2023-11-07 16:08:58,688][INFO] Rollout 10
(pid=687118) [2023-11-07 16:09:30,212][WARNING] save the best model(average reward:-960.7272727272727,average win:0.0)
(pid=687118) [2023-11-07 16:09:30,234][INFO] Rollout 11
(pid=687118) [2023-11-07 16:10:00,340][WARNING] save the best model(average reward:-883.6666666666666,average win:0.0)
(pid=687118) [2023-11-07 16:10:00,362][INFO] Rollout 12
(pid=687118) [2023-11-07 16:10:32,308][WARNING] save the best model(average reward:-818.1538461538462,average win:0.0)
(pid=687118) [2023-11-07 16:10:32,333][INFO] Rollout 13
(pid=687118) [2023-11-07 16:11:04,471][WARNING] save the best model(average reward:-762.2857142857143,average win:0.0)
(pid=687118) [2023-11-07 16:11:04,495][INFO] Rollout 14
(pid=687118) [2023-11-07 16:11:34,548][WARNING] save the best model(average reward:-713.6,average win:0.0)
(pid=687118) [2023-11-07 16:11:34,572][INFO] Rollout 15
(pid=687118) [2023-11-07 16:12:05,414][WARNING] save the best model(average reward:-672.25,average win:0.0)
(pid=687118) [2023-11-07 16:12:05,435][INFO] Rollout 16
(pid=687118) [2023-11-07 16:12:35,794][WARNING] save the best model(average reward:-635.5294117647059,average win:0.0)
(pid=687118) [2023-11-07 16:12:35,812][INFO] Rollout 17
(pid=687118) [2023-11-07 16:13:05,773][WARNING] save the best model(average reward:-602.8888888888889,average win:0.0)
(pid=687118) [2023-11-07 16:13:05,796][INFO] Rollout 18
(pid=687118) [2023-11-07 16:13:36,861][WARNING] save the best model(average reward:-574.3157894736842,average win:0.0)
(pid=687118) [2023-11-07 16:13:36,877][INFO] Rollout 19
(pid=687118) [2023-11-07 16:14:07,636][WARNING] save the best model(average reward:-547.4,average win:0.0)
(pid=687118) [2023-11-07 16:14:07,653][INFO] Rollout 20
(pid=687118) [2023-11-07 16:14:38,884][WARNING] save the best model(average reward:-48.8,average win:0.0)
(pid=687118) [2023-11-07 16:14:38,905][INFO] Rollout 21
(pid=687118) [2023-11-07 16:15:08,913][WARNING] save the best model(average reward:-48.6,average win:0.0)
(pid=687118) [2023-11-07 16:15:08,931][INFO] Rollout 22
(pid=687118) [2023-11-07 16:15:38,800][WARNING] save the best model(average reward:-48.2,average win:0.0)
(pid=687118) [2023-11-07 16:15:38,820][INFO] Rollout 23
(pid=687118) [2023-11-07 16:16:07,657][INFO] Rollout 24
(pid=687118) [2023-11-07 16:16:38,027][WARNING] save the best model(average reward:-47.0,average win:0.0)
(pid=687118) [2023-11-07 16:16:38,044][INFO] Rollout 25
(pid=687118) [2023-11-07 16:17:07,691][INFO] Rollout 26
.
.
.(pid=687118) [2023-11-07 18:14:16,224][INFO] Rollout 264
(pid=687118) [2023-11-07 18:14:47,594][INFO] Rollout 265
Traceback (most recent call last):
File "main_pbt.py", line 126, in <module>
main()
File "main_pbt.py", line 114, in main
runner.run()
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/framework/pbt_runner.py", line 111, in run
ray.get(training_task_ref)
File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=687114, ip=10.1.80.147, repr=<light_malib.training.training_manager.TrainingManager object at 0x7fd1fa036340>)
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/decorator.py", line 22, in wrapper
return func(self, *args, **kwargs)
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/training_manager.py", line 146, in train
statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=687108, ip=10.1.80.147, repr=<light_malib.training.distributed_trainer.DistributedTrainer object at 0x7f135e001b80>)
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/training/distributed_trainer.py", line 200, in optimize
training_info = self.trainer.optimize(batch)
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
tmp_opt_result = self.loss(mini_batch)
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/common/loss_func.py", line 70, in __call__
return tensor_cast(
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/utils/general.py", line 110, in wrap
rets = func(*new_args, **kwargs)
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
values, action_log_probs, dist_entropy = self._evaluate_actions(
File "/home/zhq/Doctor/AI_Innovation/DecisionMaking/DB-Football_v1.1/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
dist = torch.distributions.Categorical(logits=logits)
File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/categorical.py", line 66, in __init__
super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
File "/home/zhq/software_tools/anaconda3/envs/light-malib/lib/python3.8/site-packages/torch/distributions/distribution.py", line 56, in __init__
raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (96000, 19)) of distribution Categorical(logits: torch.Size([96000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=<SubBackward0>)
Hi, I have just uploaded a demo config. Feel free to try it out.
Also, my local pytorch version is at 1.13.0 and I cannot reproduce this error. Which pytorch version are you using?
Thank you very much for your response. This error does not happen again when I used the expr_10_vs_10_psro.yaml, where I set the batch_size=100 and num_works=5
Hi, when i tried to replicate your code, i meet some issues. i can not find where the problem is or how to solve it, could you help me? my environment is builted the same as you recommend, the system is ubuntu 18.04 LTS. there are 2 gpus : 1080Ti & titan X in the code, I only modified the 'num_workers' and 'batch_size' in the YAML file to match my hardware. when i run
python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml
,It generated the following error message: ` (/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF) lxd@lxd-T630:/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football$ python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml [2023-09-28 09:18:34,036][WARNING] No active cluster detected, will create local ray instance. [2023-09-28 09:18:44,991][WARNING] ============== Cluster Info ============== {'node_ip_address': '192.168.1.109', 'raylet_ip_address': '192.168.1.109', 'redis_address': '192.168.1.109:6379', 'object_store_address': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-09-28_09-18-34_037912_47469', 'metrics_export_port': 55494, 'node_id': 'a8211a7e16deb107246a6dfd4b68c7d43f1a31ddb9fdba7c482c3b64'} [2023-09-28 09:18:44,993][WARNING] * cluster resources: {'accelerator_type:G': 1.0, 'GPU': 2.0, 'object_store_memory': 17054784307.0, 'memory': 34109568615.0, 'node:192.168.1.109': 1.0, 'CPU': 48.0} [2023-09-28 09:18:44,993][WARNING] this worker ip: 192.168.1.109 [2023-09-28 09:18:44,994][WARNING] Automatically set master ip to local ip address: 192.168.1.109 [2023-09-28 09:18:46,480][INFO] AgentManager initialized [2023-09-28 09:18:46,514][WARNING] use meta solver type: nash [2023-09-28 09:18:46,991][INFO] PBTRunner psro initialized [2023-09-28 09:18:46,991][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1 [2023-09-28 09:18:46,995][WARNING] use model type: gr_football.built_in_11 (pid=47592) [2023-09-28 09:18:49,787][INFO] DataServer initialized (pid=47595) [2023-09-28 09:18:49,798][INFO] PolicyServer initialized [2023-09-28 09:18:50,411][INFO] Load initial policy built_in_11 from light_malib/trained_models/gr_football/11_vs_11/built_in [2023-09-28 09:18:50,426][WARNING] use model type: gr_football.basic_11 [2023-09-28 09:18:50,479][WARNING] agent_0: agent_0-default-0 is initialized from random [2023-09-28 09:18:50,479][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 09:18:50,523][WARNING] after initialization:policy_ids: ['built_in_11', 'agent_0-default-0'] populations:
policy_ids:['built_in_11', 'agent_0-default-0']
policy_ids:['built_in_11', 'agent_0-default-0'] [2023-09-28 09:18:50,524][WARNING] Evaluation rollouts (num: 50) for 3 policy combinations: [{'agent_0': {'built_in_11': 1.0}, 'agent_1': {'built_in_11': 1.0}}, {'agent_0': {'built_in_11': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}] (pid=47611) [2023-09-28 09:18:51,072][INFO] TrainingManager initialized (pid=47610) [2023-09-28 09:18:51,149][INFO] RolloutManager initialized (pid=47606) [2023-09-28 09:19:02,415][INFO] DataPrefetcher initialized (pid=47599) [2023-09-28 09:19:02,593][INFO] trainer_1 (local rank: 1) initialized (pid=47609) [2023-09-28 09:19:02,603][INFO] trainer_0 (local rank: 0) initialized Elo = dict_items([('built_in_11', 1015.631846603239), ('agent_0-default-0', 984.368153396761)]) [2023-09-28 09:30:57,920][INFO] policy_data: [('built_in_11', 'built_in_11'):{'payoff': 5.551115123125783e-17, 'score': 0.5, 'win': 0.28, 'lose': 0.28, 'my_goal': 0.43, 'goal_diff': 0.0}],[('built_in_11', 'agent_0-default-0'):{'payoff': 1.0, 'score': 1.0, 'win': 1.0, 'lose': 0.0, 'my_goal': 3.883116883116883, 'goal_diff': 3.883116883116883}],[('agent_0-default-0', 'built_in_11'):{'payoff': -1.0, 'score': 0.0, 'win': 0.0, 'lose': 1.0, 'my_goal': 0.0, 'goal_diff': -3.883116883116883}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 0.0, 'score': 0.5, 'win': 0.25, 'lose': 0.25, 'my_goal': 0.42, 'goal_diff': 0.0}], (pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:59: UserWarning: Starting a Matplotlib GUI outside of the main thread will likely fail. (pid=47605) fig = plt.figure() (pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:63: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=47605) ax.set_xticklabels([""] + xpid, rotation=90) (pid=47605) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:64: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=47605) ax.set_yticklabels([""] + ypid) [2023-09-28 09:30:58,519][INFO] payoff table: +-------------+---------------+-------------+ | | built_in_11 | default-0 | +=============+===============+=============+ | built_in_11 | +0 | +100 | +-------------+---------------+-------------+ | default-0 | -100 | +0 | +-------------+---------------+-------------+ [2023-09-28 09:30:58,520][INFO] default-0's top 10 worst opponents are: +-------------+----------+ | policy_id | payoff | +=============+==========+ | built_in_11 | -100.00 | +-------------+----------+ | default-0 | +0.00 | +-------------+----------+ [2023-09-28 09:31:10,202][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0 [2023-09-28 09:31:10,203][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 09:31:10,223][WARNING] ********** Generation[0] Agent[agent_0] START ********** [2023-09-28 09:31:10,223][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={})
(pid=47592) [2023-09-28 09:31:10,243][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 8}}
(pid=47592) [2023-09-28 09:31:10,248][INFO] DataServer created data table agent_0-default-1
(pid=47610) [2023-09-28 09:31:10,281][INFO] Rollout 1
(pid=47599) [2023-09-28 09:31:10,431][INFO] local_rank: 1 cuda_visible_devices:1
(pid=47609) [2023-09-28 09:31:10,405][INFO] local_rank: 0 cuda_visible_devices:0
(pid=47599) [2023-09-28 09:31:12,242][WARNING] trainer_1 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=47609) [2023-09-28 09:31:12,229][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=47609) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.)
(pid=47609) value = torch.FloatTensor(value)
(pid=47599) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.)
(pid=47599) value = torch.FloatTensor(value)
(pid=47610) [2023-09-28 09:32:56,022][WARNING] save the best model(average reward:-5092.5,average win:0.0)
(pid=47610) [2023-09-28 09:32:56,081][INFO] Rollout 2
(pid=47610) [2023-09-28 09:34:40,549][WARNING] save the best model(average reward:-3465.0,average win:0.0)
(pid=47610) [2023-09-28 09:34:40,601][INFO] Rollout 3
(pid=47611) 2023-09-28 09:35:41,233 ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::DistributedTrainer.optimize() (pid=47599, ip=192.168.1.109, repr=)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize
(pid=47611) training_info = self.trainer.optimize(batch)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
(pid=47611) tmp_opt_result = self.loss(mini_batch)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in __call__
(pid=47611) return tensor_cast(
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap
(pid=47611) rets = func(*new_args, **kwargs)
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
(pid=47611) values, action_log_probs, dist_entropy = self._evaluate_actions(
(pid=47611) File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
(pid=47611) dist = torch.distributions.Categorical(logits=logits)
(pid=47611) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in __init__
(pid=47611) super().__init__(batch_shape, validate_args=validate_args)
(pid=47611) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in __init__
(pid=47611) raise ValueError(
(pid=47611) ValueError: Expected parameter logits (Tensor of shape (40000, 19)) of distribution Categorical(logits: torch.Size([40000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
(pid=47611) tensor([[nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) ...,
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan],
(pid=47611) [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
(pid=47611) grad_fn=)
(pid=47610) [2023-09-28 09:35:41,283][INFO] Saving model agent_0 agent_0-default-1 3 to /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/./logs/gr_football/10_vs_10_psro/2023-09-28-09-18-44/agent_0/agent_0-default-1/3
Traceback (most recent call last):
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 126, in
main()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 114, in main
runner.run()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/framework/pbt_runner.py", line 106, in run
ray.get(training_task_ref)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=47611, ip=192.168.1.109, repr=)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/decorator.py", line 22, in wrapper
return func(self, *args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/training_manager.py", line 146, in train
statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=47609, ip=192.168.1.109, repr=)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize
training_info = self.trainer.optimize(batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
tmp_opt_result = self.loss(mini_batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in __call__
return tensor_cast(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap
rets = func(*new_args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
values, action_log_probs, dist_entropy = self._evaluate_actions(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
dist = torch.distributions.Categorical(logits=logits)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in __init__
raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (40000, 19)) of distribution Categorical(logits: torch.Size([40000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=)
`
i am not sure if it was a hardware issure, so i tried training with just one TITAN X, but it still generated the following error message:
`
(/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF) lxd@lxd-T630:/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football$ python light_malib/main_pbt.py --config light_malib/expr/gr_football/expr_10_vs_10_psro.yaml
[2023-09-28 09:55:44,004][WARNING] No active cluster detected, will create local ray instance.
[2023-09-28 09:55:52,920][WARNING] ============== Cluster Info ==============
{'node_ip_address': '192.168.1.109', 'raylet_ip_address': '192.168.1.109', 'redis_address': '192.168.1.109:6379', 'object_store_address': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830/sockets/raylet', 'webui_url': None, 'session_dir': '/tmp/ray/session_2023-09-28_09-55-44_005995_37830', 'metrics_export_port': 58593, 'node_id': '0b4c8573ddd5462ff763c6db9c7b0cd22dbe01d81d14b7398a7e5ece'}
[2023-09-28 09:55:52,923][WARNING] * cluster resources:
{'object_store_memory': 17818028851.0, 'GPU': 2.0, 'accelerator_type:G': 1.0, 'node:192.168.1.109': 1.0, 'memory': 35636057703.0, 'CPU': 48.0}
[2023-09-28 09:55:52,923][WARNING] this worker ip: 192.168.1.109
[2023-09-28 09:55:52,924][WARNING] Automatically set master ip to local ip address: 192.168.1.109
[2023-09-28 09:55:54,333][INFO] AgentManager initialized
[2023-09-28 09:55:54,366][WARNING] use meta solver type: nash
[2023-09-28 09:55:54,844][INFO] PBTRunner psro initialized
[2023-09-28 09:55:54,845][INFO] PolicyFactory_agent_0_default new policy ctr starts at -1
[2023-09-28 09:55:54,849][WARNING] use model type: gr_football.built_in_11
(pid=37950) [2023-09-28 09:55:57,624][INFO] PolicyServer initialized
(pid=37956) [2023-09-28 09:55:57,675][INFO] DataServer initialized
[2023-09-28 09:55:58,195][INFO] Load initial policy built_in_11 from light_malib/trained_models/gr_football/11_vs_11/built_in
[2023-09-28 09:55:58,210][WARNING] use model type: gr_football.basic_11
[2023-09-28 09:55:58,257][WARNING] agent_0: agent_0-default-0 is initialized from random
[2023-09-28 09:55:58,257][WARNING] policy agent_0-default-0 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False}
[2023-09-28 09:55:58,286][WARNING] after initialization:
policy_ids:
['built_in_11', 'agent_0-default-0']
populations:
policy_ids:['built_in_11', 'agent_0-default-0']
policy_ids:['built_in_11', 'agent_0-default-0'] [2023-09-28 09:55:58,287][WARNING] Evaluation rollouts (num: 50) for 3 policy combinations: [{'agent_0': {'built_in_11': 1.0}, 'agent_1': {'built_in_11': 1.0}}, {'agent_0': {'built_in_11': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}, {'agent_0': {'agent_0-default-0': 1.0}, 'agent_1': {'agent_0-default-0': 1.0}}] (pid=37940) [2023-09-28 09:55:58,899][INFO] TrainingManager initialized (pid=37954) [2023-09-28 09:55:58,891][INFO] RolloutManager initialized (pid=37970) [2023-09-28 09:56:08,109][INFO] trainer_0 (local rank: 0) initialized (pid=37957) [2023-09-28 09:56:08,385][INFO] DataPrefetcher initialized Elo = dict_items([('built_in_11', 1015.3241542955467), ('agent_0-default-0', 984.6758457044533)]) [2023-09-28 10:07:43,192][INFO] policy_data: [('built_in_11', 'built_in_11'):{'payoff': 0.0, 'score': 0.5, 'win': 0.27, 'lose': 0.27, 'my_goal': 0.5, 'goal_diff': 0.0}],[('built_in_11', 'agent_0-default-0'):{'payoff': 0.9807692307692307, 'score': 0.9903846153846154, 'win': 0.9807692307692308, 'lose': 0.0, 'my_goal': 4.035256410256411, 'goal_diff': 4.035256410256411}],[('agent_0-default-0', 'built_in_11'):{'payoff': -0.9807692307692308, 'score': 0.009615384615384616, 'win': 0.0, 'lose': 0.9807692307692308, 'my_goal': 0.0, 'goal_diff': -4.035256410256411}],[('agent_0-default-0', 'agent_0-default-0'):{'payoff': 5.551115123125783e-17, 'score': 0.5, 'win': 0.29000000000000004, 'lose': 0.29000000000000004, 'my_goal': 0.44, 'goal_diff': 0.0}], (pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:59: UserWarning: Starting a Matplotlib GUI outside of the main thread will likely fail. (pid=37960) fig = plt.figure() (pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:63: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=37960) ax.set_xticklabels([""] + xpid, rotation=90) (pid=37960) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/monitor/monitor.py:64: UserWarning: FixedFormatter should only be used together with FixedLocator (pid=37960) ax.set_yticklabels([""] + ypid) [2023-09-28 10:07:43,815][INFO] payoff table: +-------------+---------------+-------------+ | | built_in_11 | default-0 | +=============+===============+=============+ | built_in_11 | +0 | +98 | +-------------+---------------+-------------+ | default-0 | -98 | +0 | +-------------+---------------+-------------+ [2023-09-28 10:07:43,816][INFO] default-0's top 10 worst opponents are: +-------------+----------+ | policy_id | payoff | +=============+==========+ | built_in_11 | -98.08 | +-------------+----------+ | default-0 | +0.00 | +-------------+----------+ [2023-09-28 10:07:56,080][WARNING] agent_0: agent_0-default-1 is initialized from last best policy agent_0-default-0 [2023-09-28 10:07:56,081][WARNING] policy agent_0-default-1 uses custom_config: {'gamma': 1.0, 'use_cuda': False, 'use_dueling': False, 'preprocess_mode': 'flatten', 'use_q_head': False, 'ppo_epoch': 5, 'num_mini_batch': 1, 'return_mode': 'new_gae', 'gae': {'gae_lambda': 0.95}, 'vtrace': {'clip_rho_threshold': 1.0, 'clip_pg_rho_threshold': 100.0}, 'use_rnn': False, 'rnn_layer_num': 1, 'rnn_data_chunk_length': 16, 'use_feature_normalization': True, 'use_popart': True, 'popart_beta': 0.99999, 'entropy_coef': 0.0, 'clip_param': 0.2, 'use_modified_mappo': False} [2023-09-28 10:07:56,107][WARNING] ********** Generation[0] Agent[agent_0] START ********** [2023-09-28 10:07:56,107][INFO] training_desc: TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={})
(pid=37956) [2023-09-28 10:07:56,125][WARNING] table_cfgs:DataServer uses {'capacity': 1000, 'sampler_type': 'lumrf', 'sample_max_usage': 10000, 'rate_limiter_cfg': {'min_size': 8}}
(pid=37956) [2023-09-28 10:07:56,129][INFO] DataServer created data table agent_0-default-1
(pid=37954) [2023-09-28 10:07:56,159][INFO] Rollout 1
(pid=37970) [2023-09-28 10:07:56,375][INFO] local_rank: 0 cuda_visible_devices:0
(pid=37970) [2023-09-28 10:07:57,988][WARNING] trainer_0 reset to training_task TrainingDesc(agent_id='agent_0', policy_id='agent_0-default-1', policy_distributions={'agent_0': {'agent_0-default-1': 1.0}, 'agent_1': OrderedDict([('built_in_11', 0.99999), ('agent_0-default-0', 1e-05)])}, share_policies=True, sync=False, stopper=, kwargs={'cfg': {'distributed': {'resources': {'num_cpus': 1, 'num_gpus': 1, 'resources': {'node:192.168.1.109': 0.01}}}, 'optimizer': 'Adam', 'actor_lr': 0.0005, 'critic_lr': 0.0005, 'opti_eps': 1e-05, 'weight_decay': 0.0, 'lr_decay': False, 'lr_decay_epoch': 2000}})
(pid=37970) /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py:53: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343964576/work/torch/csrc/utils/tensor_numpy.cpp:206.)
(pid=37970) value = torch.FloatTensor(value)
(pid=37954) [2023-09-28 10:09:29,829][WARNING] save the best model(average reward:-5103.75,average win:0.0)
(pid=37954) [2023-09-28 10:09:29,896][INFO] Rollout 2
(pid=37954) [2023-09-28 10:11:04,900][WARNING] save the best model(average reward:-3472.5,average win:0.0)
(pid=37954) [2023-09-28 10:11:04,950][INFO] Rollout 3
(pid=37954) [2023-09-28 10:12:38,904][WARNING] save the best model(average reward:-2661.875,average win:0.0)
(pid=37954) [2023-09-28 10:12:38,938][INFO] Rollout 4
(pid=37954) [2023-09-28 10:14:12,399][WARNING] save the best model(average reward:-2166.5,average win:0.0)
(pid=37954) [2023-09-28 10:14:12,440][INFO] Rollout 5
(pid=37960) Exception ignored in:
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 4016, in __del__
(pid=37960) self.tk.call('image', 'delete', self.name)
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37960) Exception ignored in:
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37970) [2023-09-28 10:15:54,407][WARNING] queue is full. May have bugs in training.
(pid=37960) Exception ignored in:
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37960) Exception ignored in:
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37960) Exception ignored in:
(pid=37960) Traceback (most recent call last):
(pid=37960) File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/tkinter/__init__.py", line 351, in __del__
(pid=37960) if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
(pid=37960) RuntimeError: main thread is not in main loop
(pid=37954) [2023-09-28 10:15:57,987][WARNING] save the best model(average reward:-1838.75,average win:0.0)
(pid=37954) [2023-09-28 10:15:58,037][INFO] Rollout 6
(pid=37954) [2023-09-28 10:17:20,960][WARNING] save the best model(average reward:-1609.642857142857,average win:0.0)
(pid=37954) [2023-09-28 10:17:21,004][INFO] Rollout 7
(pid=37954) [2023-09-28 10:18:54,245][WARNING] save the best model(average reward:-1433.125,average win:0.0)
(pid=37954) [2023-09-28 10:18:54,289][INFO] Rollout 8
(pid=37954) [2023-09-28 10:20:04,518][INFO] Saving model agent_0 agent_0-default-1 8 to /media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/./logs/gr_football/10_vs_10_psro/2023-09-28-09-55-52/agent_0/agent_0-default-1/8
Traceback (most recent call last):
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 126, in
main()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/main_pbt.py", line 114, in main
runner.run()
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/framework/pbt_runner.py", line 106, in run
ray.get(training_task_ref)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/ray/worker.py", line 1625, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TrainingManager.train() (pid=37940, ip=192.168.1.109, repr=)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/decorator.py", line 22, in wrapper
return func(self, *args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/training_manager.py", line 146, in train
statistics_list = ray.get(
ray.exceptions.RayTaskError(ValueError): ray::DistributedTrainer.optimize() (pid=37970, ip=192.168.1.109, repr=)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/training/distributed_trainer.py", line 200, in optimize
training_info = self.trainer.optimize(batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/trainer.py", line 94, in optimize
tmp_opt_result = self.loss(mini_batch)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/common/loss_func.py", line 70, in __call__
return tensor_cast(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/utils/general.py", line 110, in wrap
rets = func(*new_args, **kwargs)
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 143, in loss_compute
values, action_log_probs, dist_entropy = self._evaluate_actions(
File "/media/lxd/0A7AE0627AE04BCF/lzw/football_game/DB-Football/light_malib/algorithm/mappo/loss.py", line 270, in _evaluate_actions
dist = torch.distributions.Categorical(logits=logits)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/categorical.py", line 66, in __init__
super().__init__(batch_shape, validate_args=validate_args)
File "/media/lxd/880AA9210AA90CEE/anaconda_envs/lzw_GRF/lib/python3.9/site-packages/torch/distributions/distribution.py", line 62, in __init__
raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (80000, 19)) of distribution Categorical(logits: torch.Size([80000, 19])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=)
`
do you know why this happened?