Closed dan-zheng closed 5 years ago
Reference PyTorch program (requires PyTorch v1.0rc1):
import torch
import torch.nn as nn
import torch.nn.functional as F
cuda = torch.device('cuda')
input_length = 50
batch_size = 16
alphabet_size = 20
ctc_loss = nn.CTCLoss(reduction='none').to(cuda)
log_probs = torch.ones(input_length, batch_size, alphabet_size).log_softmax(2).detach().requires_grad_()
targets = torch.ones((batch_size), dtype=torch.long)
input_lengths = torch.full((batch_size,), input_length, dtype=torch.long)
target_lengths = torch.ones((batch_size,), dtype=torch.long)
loss = ctc_loss(log_probs, targets, input_lengths, target_lengths)
# This is the "probs" argument passed to `cudnnCTCLoss`.
print(log_probs.softmax(2))
print(loss)
# `probs`:
# tensor([[[0.0500, 0.0500, 0.0500, ..., 0.0500, 0.0500, 0.0500],
# [0.0500, 0.0500, 0.0500, ..., 0.0500, 0.0500, 0.0500],
# [0.0500, 0.0500, 0.0500, ..., 0.0500, 0.0500, 0.0500],
# ...,
# [0.0500, 0.0500, 0.0500, ..., 0.0500, 0.0500, 0.0500],
# [0.0500, 0.0500, 0.0500, ..., 0.0500, 0.0500, 0.0500],
# [0.0500, 0.0500, 0.0500, ..., 0.0500, 0.0500, 0.0500]]],
# grad_fn=<SoftmaxBackward>)
# `loss` (not reduced)
# tensor([142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360,
# 142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360, 142.6360,
# 142.6360, 142.6360], grad_fn=<CtcLossBackward>)
The PyTorch DeepSpeech model uses a different CTC loss implementation (warp-ctc
).
We should change the PyTorch model to use the new PyTorch library implementation of CTC loss for parity.
cuDNNCTCLoss
wrapper function.Notes: