githubharald / CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
https://towardsdatascience.com/3797e43a86c
MIT License
817 stars 182 forks source link

difference between this repo and ctc decoder in tensorflow #12

Closed cosimo17 closed 5 years ago

cosimo17 commented 5 years ago

Thanks for your code. I test my sequence with both ctc decoder in tensorflow and this repo.I always get different result. Tensorflow is always right.This repo sometimes return right and sometimes return wrong. Have you ever compare these two implementation?

githubharald commented 5 years ago

I compared my C++ implementation (basically the same as the Python code in this repo) with the TF beam search decoder and got roughly the same accuracy / char error rate. When using a language model, my beam search decoder usually outperformed the one from TF.

You can provide a sample for which results are worse than in TF, then I can have a look. Provide the input to the decoder (e.g. for the matrix you can use numpy.save) and also say what the expected result is.

cosimo17 commented 5 years ago

Oh.Thanks for your quick reply.

  1. I use the same beam width.(set to 25)
  2. Whether I use softmax or not.It is still diffrent with TF.
  3. lm is set to None.

This is a sample matrix given by network with shape [60,1,64]. (np.npy in zip file) test_matrix.zip

(This is the shape format needed by tf ctc decoder. You need to squeeze the axis 1 before feed to your beam search algorithm).

The result given by tf(it is ok): [56 29 34 49 45 51 5 49 32 47 21 24]

The result given by ctc decoder in this repo(21,19,48,three redundant chars had been reported): (56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 21, 19, 48)

The result given by ctc decoder in thsi repo(add softmax function to sample matrix axis 2): (56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 59, 48)

CTC decode algorithm is not familiar to me. I just use it as a black box. Hoping for your help. Thanks.

githubharald commented 5 years ago

Hi,

just compared output of my beam search decoder with the one from TF - results are the same:

My result: [56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 59, 48]
TF result: [56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 59, 48]
Is equal: True

Code to re-produce:

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from BeamSearch import ctcBeamSearch

def softmax(mat):
    "calc softmax such that labels per time-step form probability distribution"
    maxT, _ = mat.shape # dim0=t, dim1=c
    res = np.zeros(mat.shape)
    for t in range(maxT):
        y = mat[t, :]
        e = np.exp(y)
        s = np.sum(e)
        res[t, :] = e/s
    return res

def compare_decoders():
    mat_tbc = np.load('test_matrix.npy')
    mat_sm_tc = softmax(mat_tbc[:,0,:])

    # my decoder
    fake_classes = [chr(65+i) for i in range(mat_sm_tc.shape[1]-1)]
    my_res = ctcBeamSearch(mat_sm_tc, classes=fake_classes, beamWidth=25, lm=None)
    my_labels = [fake_classes.index(c) for c in my_res]
    print('My result:', my_labels)

    # TF decoder
    tensor_mat = tf.placeholder(tf.float32, mat_tbc.shape)
    tensor_ctc = tf.nn.ctc_beam_search_decoder(tensor_mat, [mat_tbc.shape[0]], beam_width=25, merge_repeated=False)
    sess = tf.Session()
    tf_res = sess.run(tensor_ctc, {tensor_mat: mat_tbc})
    tf_labels = tf_res[0][0].values.tolist()
    print('TF result:', tf_labels)
    print('Is equal:', tf_labels==my_labels)

    # plot matrix
    plt.imshow(mat_sm_tc)
    plt.xlabel('chars')
    plt.ylabel('time')
    plt.show()

if __name__ == '__main__':
    compare_decoders()

Here is what your input matrix looks like. Seems like in your code you are somehow ignoring the last few time-steps (marked red): there are some non-blank characters recognized at the end of the sequence. Maybe caused by your sequence_length input parameter to the TF decoder.

Figure_1

cosimo17 commented 5 years ago

Oh. I understood.I made a low-level mistake. The seq-length param is set to 30 and 30 is equal to my first version network output. I forgot to change it when I change the actual length of network output from 30 to 60. Your code is right. Thanks for your help. Now I can continue my work.