Closed quickwritereader closed 2 years ago
Its test coverage is not enough, yet. I only tested its loss. as there could be some typo and errors it's in a WIP state for now.
Besides, I need someone to consult on its parameter lists:
Currently, it's like this:
auto targetLabels = INPUT_VARIABLE(0); //{BATCH_LEN, MAX_TARGET_LEN}
auto logitInput = INPUT_VARIABLE(1); // {BATCH_LEN, FRAME_LEN, CLASS_LEN } CLASS_LEN includes blank label as well
auto targetLabelLengths = INPUT_VARIABLE(2); //{BATCH_LEN}
auto logitInputLengths = INPUT_VARIABLE(3); //{BATCH_LEN}
int blankIndex = INT_ARG(0); //BLANK INDEX
Conditions that will be corrected silently for elements of the target and logit lengthes:
//maxLenT is FRAME_LEN
// maxLenS is MAX_TARGET_LEN
lenT = lenT > maxLenT ? maxLenT : lenT;
lenS = lenS > maxLenS ? maxLenS : lenS;
if (lenS <= 0 || lenT <= 0) resultLoss = -DataTypeUtils::infOrMax<Type>();
//should be less than equal lenT
if (lenS > lenT) lenS = lenT;
Here is the python code that was used to generate a random test case
import math
import numpy as np
import tensorflow as tf
def softmax(mat):
"calc softmax such that labels per time-step form probability distribution"
maxT, _ = mat.shape # dim0=t, dim1=c
res = np.zeros(mat.shape)
for t in range(maxT):
y = mat[t, :]
e = np.exp(y-np.max(y))
s = np.sum(e)
#print(s)
res[t, :] = e/s
return res
FRAME_LEN = 6
CLASS_LEN = 5
BATCH_LEN = 4
MAX_TARGET_LEN = 4
MIN_TARGET_LEN = 2
BLANK_INDEX=CLASS_LEN-1
labelseqRand = np.random.randint( 0,high=CLASS_LEN-1,size=(BATCH_LEN,MAX_TARGET_LEN))
logitsRand = np.random.random(size=(BATCH_LEN,FRAME_LEN,CLASS_LEN))
for b in range(logitsRand.shape[0]):
logitsRand[b] = softmax(logitsRand[b])
#test to see if prob==1
np.sum(logitsRand,axis=2).shape
logitsRand = np.log(logitsRand)
label_length_rnd = np.array([MIN_TARGET_LEN, MIN_TARGET_LEN +1, MAX_TARGET_LEN, MIN_TARGET_LEN +1])
logit_length = np.array([FRAME_LEN]*4)
label_length_rnd =np.array([MIN_TARGET_LEN, MIN_TARGET_LEN +1, MAX_TARGET_LEN, MIN_TARGET_LEN +1])
logits_length=tf.convert_to_tensor(logit_length,dtype=tf.int32)
label_length=tf.convert_to_tensor(label_length_rnd,dtype=tf.int32)
labelseq=tf.convert_to_tensor(labelseqRand,dtype=tf.int32)
logits = tf.convert_to_tensor(logitsRand,dtype=tf.float32)
# check using tf
with tf.GradientTape() as t:
t.watch(logits)
#logits_time_major | (optional) If True (default), logits is shaped [time, batch, logits]. If False, shape is [batch, time, logits]
loss = tf.nn.ctc_loss(labelseq, logits, label_length, logits_length, False,None, BLANK_INDEX,None)
print(loss)
grades = t.gradient(loss, [logits])
print(grades)
What changes were proposed in this pull request?
batched ctc loss and its grad implementation for cpu.
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Quick checklist
The following checklist helps ensure your PR is complete: