Implement cuDNN CTC loss.

dan-zheng commented 5 years ago

Add cuDNNCTCLoss wrapper function.
Add test. Verified same result as PyTorch.

Notes:

PyTorch's CTC loss function takes a log prob. tensor and performs softmax. We should edit softmax to support tensors of other dimensions. (We can easily support tensors with rank <= 4).
PyTorch's CTC loss function performs a reduction op at the end to produce a scalar value. We can simply chain the reduction op at the end for simplicity.

dan-zheng commented 5 years ago

Reference PyTorch program (requires PyTorch v1.0rc1):

import torch
import torch.nn as nn
import torch.nn.functional as F

cuda = torch.device('cuda')

input_length = 50
batch_size = 16
alphabet_size = 20

ctc_loss = nn.CTCLoss(reduction='none').to(cuda)

log_probs = torch.ones(input_length, batch_size, alphabet_size).log_softmax(2).detach().requires_grad_()
targets = torch.ones((batch_size), dtype=torch.long)

input_lengths = torch.full((batch_size,), input_length, dtype=torch.long)
target_lengths = torch.ones((batch_size,), dtype=torch.long)
loss = ctc_loss(log_probs, targets, input_lengths, target_lengths)

# This is the "probs" argument passed to `cudnnCTCLoss`.
print(log_probs.softmax(2))
print(loss)

# `probs`:
# tensor([[[0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          ...,
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
#          [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500]]],
#        grad_fn=<SoftmaxBackward>)

# `loss` (not reduced)
# tensor([142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360,
#         142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360,
#         142.6360, 142.6360], grad_fn=<CtcLossBackward>)

dan-zheng commented 5 years ago

The PyTorch DeepSpeech model uses a different CTC loss implementation (warp-ctc). We should change the PyTorch model to use the new PyTorch library implementation of CTC loss for parity.

feiwang3311 / Lantern

Implement cuDNN CTC loss. #45