Open eubinecto opened 4 years ago
loss.backward()
.sample()
?y_2 = y_1 * action_mask
neg_inf = torch.scalar_tensor(float('-inf'))
y_1_masked = torch.where(y_2 == 0, neg_inf, y_2)
# https://discuss.pytorch.org/t/recommended-way-to-replace-a-partcular-value-in-a-tensor/25424
y_3 = F.softmax(y_1_masked, dim=0) # logits -> probability distributions.
return y_3.clone()
this seems to work?
game engine complete. it works like a charm!