TimZaman / dotaclient

distributed RL spaghetti al arabiata
26 stars 7 forks source link

Negative probabilities selected #20

Closed TimZaman closed 5 years ago

TimZaman commented 5 years ago

Probably due to action masking, but at the core probably a pytorch bug.

kubectl logs job4-ppo-dotaservice-6f4cc5d688-gfvt5 agent
2019-01-25 08:17:55,147 INFO     main(rmq_host=job4-ppo-rmq.default.svc.cluster.local, rmq_port=5672)
2019-01-25 08:17:55,178 INFO     setup_model_cb(host=job4-ppo-rmq.default.svc.cluster.local, port=5672)
2019-01-25 08:17:55,214 INFO     Received new model: version=3266, size=1207838b
2019-01-25 08:17:55,219 INFO     === Starting Gane 0.
2019-01-25 08:17:55,219 INFO     Starting game.
2019-01-25 08:17:55,229 INFO     Player 0 using weights version 3266
2019-01-25 08:17:55,238 INFO     Player 5 using weights version 3266
2019-01-25 08:18:13,561 INFO     Received new model: version=3267, size=1207838b
Traceback (most recent call last):
  File "agent.py", line 698, in main
    await game.play(game_id=game_id)
  File "agent.py", line 621, in play
    action_pb = player.obs_to_action(obs=obs)
  File "agent.py", line 506, in obs_to_action
    hidden=self.hidden,
  File "agent.py", line 467, in select_action
    action_dict = self.policy.select_actions(head_prob_dict=head_prob_dict)
  File "/root/dotaclient/policy.py", line 181, in select_actions
    action_dict['target_unit'] = cls.sample_action(head_prob_dict['target_unit'])
  File "/root/dotaclient/policy.py", line 161, in sample_action
    return Categorical(probs).sample()
  File "/root/.local/lib/python3.7/site-packages/torch/distributions/categorical.py", line 110, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:298
2019-01-25 08:19:00,834 ERROR    Unclosed connection: Channel('127.0.0.1', 13337, ..., path=None)
TimZaman commented 5 years ago

Entropy seems extremely low here.. e.g.

{'enum': tensor([[[5.0860e-09, 5.1802e-09, 1.0000e+00]]], grad_fn=<SoftmaxBackward>),
 'target_unit': tensor([[[1.6518e-06, 4.9796e-23, 6.7985e-24, 6.7985e-24, 6.7985e-24,
          6.7985e-24, 9.2425e-01, 1.3570e-05, 7.5734e-02, 3.8476e-21,
          3.8476e-21, 3.8476e-21, 3.8476e-21, 3.8476e-21, 3.8476e-21,
          3.8476e-21, 3.8476e-21, 3.8476e-21, 3.8476e-21, 3.8476e-21,
          3.8476e-21, 3.8476e-21, 9.0723e-19, 4.3352e-19, 3.9908e-14,
          2.5837e-28, 7.8423e-27, 1.2303e-21, 1.2303e-21, 1.2303e-21,
          1.2303e-21, 1.2303e-21, 1.2303e-21, 1.2303e-21, 1.2303e-21,
          1.2303e-21, 1.2303e-21, 1.2303e-21]]], grad_fn=<SoftmaxBackward>),
TimZaman commented 5 years ago

Fixed, added an eps for now.