Closed dan-zheng closed 6 years ago
Btw, softmax calculations verified via PyTorch:
import torch
import torch.nn.functional as F
a = torch.Tensor(range(6)).reshape(2, 3)
a.requires_grad = True
b = F.log_softmax(a, dim=1)
b.backward(torch.ones(2, 3))
print(b)
# tensor([[-2.4076, -1.4076, -0.4076],
# [-2.4076, -1.4076, -0.4076]], grad_fn=<LogSoftmaxBackward>)
print(a.grad)
# tensor([[ 0.7299, 0.2658, -0.9957],
# [ 0.7299, 0.2658, -0.9957]])
Low priority todos:
softmax(axis: Int)
, matching other reduction ops.