Open Deepayan137 opened 6 years ago
Hi SeanNaren, I am having the same issue of infinite losses. Do you have any idea on this problem?
I had this problem until I did the following:
I'm using python 3.6 and pytorch 0.3.1.
@engrean .whether we must remove all zeros in 1D label tensor ? for example, give a batch_size=2, T=3
label = [[1,0],[2,2]]
, we must remove zero, and change it to [1,2,2]
?
Assuming your zeros mean the _blank that warp-ctc expects and you ended up padding the end of your arrays with zeros, then yes, I removed all trailing zeros and concatenated the non-zero elements into a single array. And then your label lengths need to be [1, 2]
OK, thank you so much~
Hello everyone, i just wanted to ask, i have trained my OCR model on 4850 training photo, with variable sequences of characters with their ground truths i had the inf loss problem and solved it by making the unit step window (the input image width) = twice the maximum length of my sequence, so now i get high loss values like 45 and 46 for both training and validation losses, also if i have my sequence of 9 characters, and my maximum length is 30, i put blanks in the rest of 21 places,
is this a lack of data? or the blank padding is the thing that cause that, if yes, then Kindly explain the solution above in my case because i dont get it?
As far as I understood inf
values appear when it is impossible to align sequences, so a probability which is calculated by CTC will be 0. And neglog of it is inf
! I had this problem when there was an example in my dataset with the length a bit less than output of network. So, including the necessity to add blanks between double symbols, its length turned out to be more than maximal possible length. Removing this sample helped.
In second task where I use CTC I do not have such examples, however my loss become inf
after 30k iterations. I do not know why exactly it is so, but I will try to tackle this via just not doing optimizer.step()
if loss.value == np.inf
.
Thanks for the reply, in my case it was a problem of image dimensions vs sequence length
the image width should be at least 2n-1 if the sequence length is N, this how the CTC works.
On Fri, 23 Aug 2019 18:07 Nikolay Maslovich notifications@github.com wrote:
As far as I understood inf values appear when it is impossible to align sequences, so a probability which is calculated by CTC will be 0. And neglog of it is inf! I had this problem when there was an example in my dataset with the length a bit less than output of network. So, including the necessity to add blanks between double symbols, its length turned out to be more than maximal possible length. Removing this sample helped. In second task where I use CTC I do not have such examples, however my loss become inf after 30k iterations. I do not know why exactly it is so, but I will try to tackle this via just not doing optimizer.step() if loss.value == np.inf.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SeanNaren/warp-ctc/issues/29?email_source=notifications&email_token=AIZSAOS747LDR45PLEQ7363QGADNLA5CNFSM4EUPI5A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AUTIQ#issuecomment-524372386, or mute the thread https://github.com/notifications/unsubscribe-auth/AIZSAORYY6LP64VLKMOWE3TQGADNLANCNFSM4EUPI5AQ .
@AhmedKhaled945
Ie You mean that number of time-steps of RNN (If I am using standard OCR approach: CNN + LSTM) should be 2 * n - 1
, where n
-- maximal length of sequence (text) in my data?
yes as a minimum, if i want to detect a sequence with max length 30 for example, then input width should at least, be 61, can be more.
On Fri, 23 Aug 2019 18:26 Nikolay Maslovich notifications@github.com wrote:
@AhmedKhaled945 https://github.com/AhmedKhaled945 Ie You mean that number of time-steps of RNN (If I am using classical OCR approach) should be 2 * n - 1, where n -- maximal length of sequence (text) in my data?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SeanNaren/warp-ctc/issues/29?email_source=notifications&email_token=AIZSAOX5FJCQKZCC4Z7JYEDQGAFULA5CNFSM4EUPI5A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AWCLY#issuecomment-524378415, or mute the thread https://github.com/notifications/unsubscribe-auth/AIZSAOR7WIDPHIKPWU67U2TQGAFULANCNFSM4EUPI5AQ .
@AhmedKhaled945 Thank you
I'm not an expert in the ctc loss function, but my interpretation of this article suggests that the n
in 2n -1
refers to the number of consecutive repetitions in the target. So, generally:
model_output > label_length
and if there are consecutive repetitions in the label, then
model_output > label_length + 2n - 1
where n
is the number of consecutive repetitions in the label.
Hello, I am trying to train an OCR which takes a binarized image of a sentence from a document image and tries to predict the output. The losses more often than not always become infinite after running for a certain number of epochs.
I am not sure where the error is and would be very grateful if some one could point me in the right direction.