TimZaman / dotaclient

distributed RL spaghetti al arabiata
26 stars 7 forks source link

Compute entropy only for valid targets. #27

Closed TimZaman closed 5 years ago

TimZaman commented 5 years ago

Otherwise leads to these kind of bs situations, where is actually optimizes for getting invalid units to come up so that the entropy seems higher.

 'target_unit': tensor([[[3.0567e-02, 2.6279e-02, 2.6279e-02, 2.6279e-02, 2.6279e-02,
          2.6279e-02, 5.9059e-04, 1.3049e-05, 9.7086e-04, 7.9768e-04,
          3.0369e-04, 5.4091e-04, 4.1392e-02, 4.1392e-02, 4.1392e-02,
          4.1392e-02, 4.1392e-02, 4.1392e-02, 4.1392e-02, 4.1392e-02,
          4.1392e-02, 4.1392e-02, 8.5043e-05, 3.6217e-04, 3.0033e-02,
          3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02,
          3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02, 3.0033e-02,
          3.0033e-02, 3.0033e-02, 3.0033e-02]]], grad_fn=<SoftmaxBackward>),
TimZaman commented 5 years ago

Done