HawkAaron / warp-transducer

A fast parallel implementation of RNN Transducer.
Apache License 2.0
307 stars 124 forks source link

Negative value of loss function #14

Closed ZhengkunTian closed 5 years ago

ZhengkunTian commented 5 years ago

Hello, Have you ever encountered such a situation that loss function had a negative values. I have no idea. The inputs of RNNLoss are as follows: loss = RNNTLoss(logits, targets.int(), input_lengths.int(), target_lengths.int()) logits: (batch_size, time_steps, sequence_length, vocab_size) targets: (batchsize, sequence_length) 2 dimensional Tensor containing all the targets of the batch with zero padded image Thanks a lot.

HawkAaron commented 5 years ago

Hi, Its quite strange. Could you provide more details, such as dataset, framework, which branch used, computing device (cpu or gpu) ?

It seems that the logits hadn't been forwarded through the softmax layer. Did the model converged ?

ZhengkunTian commented 5 years ago

Sorry, i think it might be my fault, thanks a lot!

rzcwade commented 5 years ago

Hello, Have you ever encountered such a situation that loss function had a negative values. I have no idea. The inputs of RNNLoss are as follows: loss = RNNTLoss(logits, targets.int(), input_lengths.int(), target_lengths.int()) logits: (batch_size, time_steps, sequence_length, vocab_size) targets: (batchsize, sequence_length) 2 dimensional Tensor containing all the targets of the batch with zero padded image Thanks a lot.

Hello @ZhengkunTian ,

I have run into similar issue with negative loss value. Could you share about how you fixed your issue?

Thanks!

ZhengkunTian commented 5 years ago

hi, I'm sorry it's a little late to reply to you. If you want to input a sequence into the decoder(length is U),You must make sure that the length of the output sequence is U+1. Because we added a zero at the beginning of the input sequence. You can refer my rnn-transducer code.

for example: targets: [batch_size, U] decoder_state: [batch_size, 1, U+1, hidden_size] RNNTloss([batch_size, T, U+1, hidensize], targets, input_len, target_len)

ZhengkunTian commented 5 years ago

Hi @HawkAaron , Can you add several lines of code to check whether the parameters is right or not? Like tranducer

    if T != max_T:
        raise ValueError("Input length mismatch")
    if U != max_U + 1:
        raise ValueError("Output length mismatch")

It's a good way to help careless man like me to avoid this problem. By the way, can i add your wechat ?you can sent wechat id to my email(zhengkun.tian@nlpr.ia.ac.cn) if you agree.

rzcwade commented 5 years ago

hi, I'm sorry it's a little late to reply to you. If you want to input a sequence into the decoder(length is U),You must make sure that the length of the output sequence is U+1. Because we added a zero at the beginning of the input sequence. You can refer my rnn-transducer code.

for example: targets: [batch_size, U] decoder_state: [batch_size, 1, U+1, hidden_size] RNNTloss([batch_size, T, U+1, hidensize], targets, input_len, target_len)

Hi @ZhengkunTian , Is your target_len in RNNTloss function U+1 as well?

rzcwade commented 5 years ago

Hi @HawkAaron @ZhengkunTian ,

I am using TF binding for rnntloss and my arguments to the function are set as: training_loss = RNNTLoss(acts, targets, input_seq_length, target_seq_length_plus1, blank_label=0) Is the value for blank_label 0 or 39 (for timit set)?

Thanks!

ZhengkunTian commented 5 years ago

Hi @rzcwade , target_len in RNNTloss function is U, I don't know much about that tf version.

ZhengkunTian commented 5 years ago

Hi @HawkAaron @ZhengkunTian ,

I am using TF binding for rnntloss and my arguments to the function are set as: training_loss = RNNTLoss(acts, targets, input_seq_length, target_seq_length_plus1, blank_label=0) Is the value for blank_label 0 or 39 (for timit set)?

Thanks!

Hi, @rzcwade , It's the simplest and most diect way to use 0 to represent blank. And i think that target_length in loss should be U instead of U+1.

rzcwade commented 5 years ago

Hi @ZhengkunTian , @HawkAaron

I tried U instead of U+1 for rnntloss() and my training loss did drop to 0, but kept decreasing. Is the loss supposed to keep dropping ?

Thanks!

ZhengkunTian commented 5 years ago

Hi @rzcwade , Loss function is always decreasing but always greater than 0. Which dataset do you use? I think that there are someting wrong in your code. You could check it again. Good Luck.

HawkAaron commented 5 years ago

@rzcwade The target_len is always the length of label without blank. By default, the blank should be 0.

@ZhengkunTian It's a good idea to check the parameters. However, it's not sure whether the input tensor contains redundant zeros or not. I don't know if there is a good way to do this checking.

rzcwade commented 5 years ago

Hi @rzcwade , Loss function is always decreasing but always greater than 0. Which dataset do you use? I think that there are someting wrong in your code. You could check it again. Good Luck.

I am using TIMIT set (39 phone set). Why is it that loss always greater than 0?

ZhengkunTian commented 5 years ago

@rzcwade Because we train rnn-transducer is to minimise the log-loss L = -In p(y |x), (you can find in this paper) The probability P(y |x)is equal to the sum of over any top-left to bottom-right diagonal through the nodes. The probability can not be greater than 1, that is, the loss can not be less than 0.

rzcwade commented 5 years ago

Hi @ZhengkunTian ,

Thanks you for your answer. Could that be something went wrong inside of rnntloss function? I have a tanh and affine (project back to number of classes) layer as the final layers in the joint network. So the tensors I send into rnntloss are supposed to be ranged (-1,1). Do you have any suggestion where I can check my tensor values?

Thanks!

ZhengkunTian commented 5 years ago

Hi @rzcwade , I think it's not a big problem, maybe it's just a dimension error. The inputs of RNNTLoss consists of logits( [batch_size, T, U+1, Vocab_size]), labels([batch_size, U]) which might contains redundant zeros, inputs_length ([batch_size]) and target_length([batch_size]). Good Luck, If you want to see the details, i think that the tests file in warp_transducer is very useful!

rzcwade commented 5 years ago

Hi @ZhengkunTian ,

Thank you for your suggestions and support. I haven't successfully debugged it yet. I will keep you posted once I made progress.

Thanks!

rzcwade commented 5 years ago

Hi @HawkAaron ,

I was going through your warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py and realized that your testing acts before logsoftmax (CPU version) has a value range between 0 and 1. Is that reflecting the training in a real case? I am testing on your CPU version and I found out my acts before logsoftmax are not all within 0~1 range so I added a softmax layer to match the case. Does it make sense? Could you give me some inputs on that?

Thanks!

HawkAaron commented 5 years ago

Hi @rzcwade ,

I had manually done softmax for the testing acts, which makes it easy to check the gradients. In a real case, the acts' value should be any real number.

HawkAaron commented 5 years ago

Hey,

I have a similar problem with negative loss function values. I have read this thread and made sure that all dimensions are correct. With the same data, python implementation from Awni works correctly and the model converges.

Interestingly, the negative loss appears only when the blank_label is set to 0 (the model learns, but does not converge as well as Awni's implementation).

If blank_label is set to vocab_size then there is no negative loss (loss drops, but WER and CER are very high, what is more model produces many wordpieces which is weird because in input there is no wordpieces – could it be that despite setting blank_label to vocab_size implementation assumes that blank_label = 0? (when blank_label = vocab_size then wordpiece is set 0)).

Any help will be appreciated. Thanks!

If you use the cpu version, please apply log_softmax to acts first. Maybe your acts was not contiguous, or length was not type int. The parameter checking code was added, you may pull this repo and retrain your model.