CTC layer producing infinite losses

Deepayan137 commented 6 years ago

Hello, I am trying to train an OCR which takes a binarized image of a sentence from a document image and tries to predict the output. The losses more often than not always become infinite after running for a certain number of epochs.

Epochs:[1]/[25]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:07<00:00, 29.54it/s]
train loss (min, avg, max): (-0.276, nan, 1152.610)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 67.98it/s]
validation loss (min, avg, max): (inf, nan, -inf)
Epochs:[2]/[25]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:06<00:00, 31.24it/s]
train loss (min, avg, max): (inf, nan, -inf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 69.84it/s]
validation loss (min, avg, max): (0.000, nan, 0.000)
Epochs:[3]/[25]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:06<00:00, 33.08it/s]
train loss (min, avg, max): (1.000, nan, 1.000)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 70.82it/s]
validation loss (min, avg, max): (inf, nan, -inf)

I am not sure where the error is and would be very grateful if some one could point me in the right direction.

anonymous2792 commented 6 years ago

Hi SeanNaren, I am having the same issue of infinite losses. Do you have any idea on this problem?

engrean commented 6 years ago

I had this problem until I did the following:

We have the need for variable-sized sequences. To handle this, we pad with zeros.
Since the labels need to be a one-dimensional array with the entire sequences of each item in the batch , I removed the zero-padding from all of the labels and just concatenated them into one long sequence.
I then no longer counted the zero-padded entries when calculating the label lengths.
I had to switch my Optimizer from SGD to Adam. I'm new enough to Pytorch, that I don't know if the SGD optimizer ever worked, but in my Keras implementation (I'm doing a bake-off), I found that SGD did better in generalization than Adam did.

I'm using python 3.6 and pytorch 0.3.1.

kobenaxie commented 6 years ago

@engrean .whether we must remove all zeros in 1D label tensor ? for example, give a batch_size=2, T=3 label = [[1,0],[2,2]], we must remove zero, and change it to [1,2,2] ?

engrean commented 6 years ago

Assuming your zeros mean the _blank that warp-ctc expects and you ended up padding the end of your arrays with zeros, then yes, I removed all trailing zeros and concatenated the non-zero elements into a single array. And then your label lengths need to be [1, 2]

kobenaxie commented 6 years ago

OK, thank you so much~

AhmedKhaled945 commented 6 years ago

Hello everyone, i just wanted to ask, i have trained my OCR model on 4850 training photo, with variable sequences of characters with their ground truths i had the inf loss problem and solved it by making the unit step window (the input image width) = twice the maximum length of my sequence, so now i get high loss values like 45 and 46 for both training and validation losses, also if i have my sequence of 9 characters, and my maximum length is 30, i put blanks in the rest of 21 places,

is this a lack of data? or the blank padding is the thing that cause that, if yes, then Kindly explain the solution above in my case because i dont get it?

lebionick commented 5 years ago

As far as I understood inf values appear when it is impossible to align sequences, so a probability which is calculated by CTC will be 0. And neglog of it is inf! I had this problem when there was an example in my dataset with the length a bit less than output of network. So, including the necessity to add blanks between double symbols, its length turned out to be more than maximal possible length. Removing this sample helped. In second task where I use CTC I do not have such examples, however my loss become inf after 30k iterations. I do not know why exactly it is so, but I will try to tackle this via just not doing optimizer.step() if loss.value == np.inf.

AhmedKhaled945 commented 5 years ago

Thanks for the reply, in my case it was a problem of image dimensions vs sequence length

the image width should be at least 2n-1 if the sequence length is N, this how the CTC works.

On Fri, 23 Aug 2019 18:07 Nikolay Maslovich notifications@github.com wrote:

As far as I understood inf values appear when it is impossible to align sequences, so a probability which is calculated by CTC will be 0. And neglog of it is inf! I had this problem when there was an example in my dataset with the length a bit less than output of network. So, including the necessity to add blanks between double symbols, its length turned out to be more than maximal possible length. Removing this sample helped. In second task where I use CTC I do not have such examples, however my loss become inf after 30k iterations. I do not know why exactly it is so, but I will try to tackle this via just not doing optimizer.step() if loss.value == np.inf.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SeanNaren/warp-ctc/issues/29?email_source=notifications&email_token=AIZSAOS747LDR45PLEQ7363QGADNLA5CNFSM4EUPI5A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AUTIQ#issuecomment-524372386, or mute the thread https://github.com/notifications/unsubscribe-auth/AIZSAORYY6LP64VLKMOWE3TQGADNLANCNFSM4EUPI5AQ .

lebionick commented 5 years ago

@AhmedKhaled945 Ie You mean that number of time-steps of RNN (If I am using standard OCR approach: CNN + LSTM) should be 2 * n - 1, where n -- maximal length of sequence (text) in my data?

AhmedKhaled945 commented 5 years ago

yes as a minimum, if i want to detect a sequence with max length 30 for example, then input width should at least, be 61, can be more.

On Fri, 23 Aug 2019 18:26 Nikolay Maslovich notifications@github.com wrote:

@AhmedKhaled945 https://github.com/AhmedKhaled945 Ie You mean that number of time-steps of RNN (If I am using classical OCR approach) should be 2 * n - 1, where n -- maximal length of sequence (text) in my data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SeanNaren/warp-ctc/issues/29?email_source=notifications&email_token=AIZSAOX5FJCQKZCC4Z7JYEDQGAFULA5CNFSM4EUPI5A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AWCLY#issuecomment-524378415, or mute the thread https://github.com/notifications/unsubscribe-auth/AIZSAOR7WIDPHIKPWU67U2TQGAFULANCNFSM4EUPI5AQ .

lebionick commented 5 years ago

@AhmedKhaled945 Thank you

dzubke commented 4 years ago

I'm not an expert in the ctc loss function, but my interpretation of this article suggests that the n in 2n -1 refers to the number of consecutive repetitions in the target. So, generally: model_output > label_length and if there are consecutive repetitions in the label, then model_output > label_length + 2n - 1 where n is the number of consecutive repetitions in the label.

SeanNaren / warp-ctc

CTC layer producing infinite losses #29