KonduitAI / deeplearning4j

Eclipse Deeplearning4j, ND4J, DataVec and more - deep learning & linear algebra for Java/Scala with GPUs + Spark
http://deeplearning4j.konduit.ai
Apache License 2.0
11 stars 7 forks source link

Ctc Loss and Its gradient: Added initial batched cpu and also integrated cudnn platform implementation #557

Closed quickwritereader closed 2 years ago

quickwritereader commented 3 years ago

What changes were proposed in this pull request?

batched ctc loss and its grad implementation for cpu.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Quick checklist

The following checklist helps ensure your PR is complete:

quickwritereader commented 3 years ago

Its test coverage is not enough, yet. I only tested its loss. as there could be some typo and errors it's in a WIP state for now.

Besides, I need someone to consult on its parameter lists:

Currently, it's like this:

auto targetLabels = INPUT_VARIABLE(0);  //{BATCH_LEN, MAX_TARGET_LEN} 
auto logitInput = INPUT_VARIABLE(1); // {BATCH_LEN, FRAME_LEN, CLASS_LEN } CLASS_LEN includes blank label as well
auto targetLabelLengths = INPUT_VARIABLE(2); //{BATCH_LEN}
auto logitInputLengths = INPUT_VARIABLE(3);  //{BATCH_LEN}
int blankIndex = INT_ARG(0); //BLANK INDEX

Conditions that will be corrected silently for elements of the target and logit lengthes:

    //maxLenT is FRAME_LEN  
    // maxLenS is MAX_TARGET_LEN
    lenT = lenT > maxLenT ? maxLenT : lenT;
    lenS = lenS > maxLenS ? maxLenS : lenS;
    if (lenS <= 0 || lenT <= 0)   resultLoss = -DataTypeUtils::infOrMax<Type>(); 
     //should be less than equal lenT
    if (lenS > lenT)  lenS = lenT;
quickwritereader commented 3 years ago

Here is the python code that was used to generate a random test case

import math
import numpy as np
import tensorflow as tf

def softmax(mat):
    "calc softmax such that labels per time-step form probability distribution"
    maxT, _ = mat.shape # dim0=t, dim1=c
    res = np.zeros(mat.shape)
    for t in range(maxT):
        y = mat[t, :]
        e = np.exp(y-np.max(y))
        s = np.sum(e)
        #print(s)
        res[t, :] = e/s
    return res

FRAME_LEN = 6      
CLASS_LEN = 5 
BATCH_LEN = 4  
MAX_TARGET_LEN = 4
MIN_TARGET_LEN = 2
BLANK_INDEX=CLASS_LEN-1

labelseqRand = np.random.randint( 0,high=CLASS_LEN-1,size=(BATCH_LEN,MAX_TARGET_LEN))

logitsRand = np.random.random(size=(BATCH_LEN,FRAME_LEN,CLASS_LEN))

for b in range(logitsRand.shape[0]):
    logitsRand[b] = softmax(logitsRand[b])

#test to see if prob==1
np.sum(logitsRand,axis=2).shape

logitsRand = np.log(logitsRand)

label_length_rnd = np.array([MIN_TARGET_LEN, MIN_TARGET_LEN +1, MAX_TARGET_LEN, MIN_TARGET_LEN +1])

logit_length = np.array([FRAME_LEN]*4)

label_length_rnd =np.array([MIN_TARGET_LEN, MIN_TARGET_LEN +1, MAX_TARGET_LEN, MIN_TARGET_LEN +1])
logits_length=tf.convert_to_tensor(logit_length,dtype=tf.int32)
label_length=tf.convert_to_tensor(label_length_rnd,dtype=tf.int32)
labelseq=tf.convert_to_tensor(labelseqRand,dtype=tf.int32)
logits = tf.convert_to_tensor(logitsRand,dtype=tf.float32)

# check using tf

with tf.GradientTape() as t:
    t.watch(logits)
    #logits_time_major | (optional) If True (default), logits is shaped [time, batch, logits]. If False, shape is [batch, time, logits]
    loss = tf.nn.ctc_loss(labelseq, logits, label_length, logits_length, False,None, BLANK_INDEX,None)
    print(loss)
grades = t.gradient(loss, [logits]) 
print(grades)