baidu-research / ba-dls-deepspeech

Apache License 2.0
486 stars 174 forks source link

Different decode results when decode batch_size=1 and >1 #21

Open xinq2016 opened 7 years ago

xinq2016 commented 7 years ago

Found decode results difference with same utt when decode batch size =1 and batch_size=16

When decode batch size =1, the argmax of output of the network likes this:

blank C C blank A B B Z D blank A blank blank blank T T blank

using the arg_max, the result will be: cabzdat

but the ground truth is: cat

While when I use batch_size=16 to decode the same utt(there are more than 2 utts in the test json), then the result will be just "cat".

Why would it happen?

Many thanks Xin.q.

srvinay commented 7 years ago

This may be due to the batch-normalization layers. Could you retrain another network without batch-normalization? keras is now 2.0 and doesn't support the mode flag anymore. You could also try upgrading to that, this tutorial is quite old.

xf4fresh commented 7 years ago

@srvinay @xinq2016 I encountered the same problem. When training the model, the parameter mb_size (mini-batch size) defaults to 16, but during test, the prediction results will be different if mb_size is modified to other values, such as 1, 8.

I thought that setting the value of mode 0 would solve the problem. Experiments show that this does not work.

        mode: integer, 0, 1 or 2.
            - 0: feature-wise normalization.
                Each feature map in the input will
                be normalized separately. The axis on which
                to normalize is specified by the `axis` argument.
                Note that if the input is a 4D image tensor
                using Theano conventions (samples, channels, rows, cols)
                then you should set `axis` to `1` to normalize along
                the channels axis.
                During training we use per-batch statistics to normalize
                the data, and during testing we use running averages
                computed during the training phase.
            - 1: sample-wise normalization. This mode assumes a 2D input.
            - 2: feature-wise normalization, like mode 0, but
                using per-batch statistics to normalize the data during both
                testing and training.

The version of keras used is 1.1.2. If upgrade the keras to 2.0, how do i modify the code? I would be very grateful if the code snippet can be given. Now I do not know which code in the project needs to be modified if keras is upgraded.

reith commented 7 years ago

@xf4fresh Beside dropping mode on v2, have you tried setting learning phase to False druing test? https://github.com/baidu-research/ba-dls-deepspeech/blob/master/visualize.py#L40