Training - Githubissues

rustagiadi95 commented 6 years ago

Can u tell me exact steps to train the model? with all the datasets and upto what extent it should be trained along with learning rates and all...plz help me put brother

Bartzi commented 6 years ago

Which dataset do you want to do experiments on? What did you try until now?

rustagiadi95 commented 6 years ago

https://drive.google.com/open?id=16PwjdAR7UWrovgHNumuLE_9u6Q7uyVj9 https://drive.google.com/open?id=1gkORLYpovnIQ2FNSD6YfPhBYzmhqsMID

These are the two versions of the net you created on the stn-ocr paper. Thy are practically same. You can make open either of them. I am working on all of the datasets. Both the 32x32 without bonding box and only label dataset and the variable size multiple bounding box dataset. I have extracted the dataset out of the second one successfully too. Next I want to work upon the fsns dataset that you mentioned. I tried to train the net on the 32x32 svhn dataset and the training losses are not good. I understand it is the first dataset that this net has encountered, I have used only 20000 images of this dataset and with 5 epochs. The learning rate range(0.00001 - 0.0000005) and the optimizer(SGD) u asked to work with has not shown me the results till this point. I am just really curious that if I trained it on the full training dataset (~73K images) of this dataset. Will it improve? and if I am gonna work, how may epochs should I use? It is requiring a lot of computing power, that is why I am very cautious about this. Secondly, what should I do to make it almost completely accurate? I know these are a lot of questions, but I think ur research is really commendable and it should be get appreciation. Plz help out.

Bartzi commented 6 years ago

Hmm, looking at your code I can only say the following:

try to use a lower learning rate like 0.0001 or even 0.00001
increase your batch size! Will never work, because the network uses BatchNorm. A batch size of 32 should work quite nicely
try to use Adam instead of SGD. Adam converges more quickly.
try to create a similar tool like the BBoxPlotter that I created (you can find it in the insights folder). This tool lets you observe the progress of the training. It does so by using the network to do a prediction on a given image for each iteration of the training. This image is then saved to the hard disk, so you can inspect the state of the network at a given time step. With such a tool you can very quickly determine whether the network diverges or not. This is something you can not directly see from the loss values. So I highly recommend doing this!

rustagiadi95 commented 6 years ago

Hello Sir, The suggestions that you gave me seem to work well for me. The losses are getting reduced. I just want to check if the model will work on my data or not. I want to test it on my data. The training is a bit time consuming. So I want to have the pre-trained weights. I am attaching a sample of the data. Please look over them and reply if the model can work on detecting the text within these images or not, if it works please let me know if you can provide me the pretrained weights or not. Within the images, I want to detect the channel name and number

On Mon, Oct 22, 2018 at 7:57 PM Christian Bartz notifications@github.com wrote:

Hmm, looking at your code I can only say the following:

try to use a lower learning rate like 0.0001 or even 0.00001

increase your batch size! Will never work, because the network uses BatchNorm. A batch size of 32 should work quite nicely

try to use Adam instead of SGD. Adam converges more quickly.

try to create a similar tool like the BBoxPlotter that I created (you can find it in the insights folder). This tool lets you observe the progress of the training. It does so by using the network to do a prediction on a given image for each iteration of the training. This image is then saved to the hard disk, so you can inspect the state of the network at a given time step. With such a tool you can very quickly determine whether the network diverges or not. This is something you can not directly see from the loss values. So I highly recommend doing this!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Bartzi/stn-ocr/issues/25#issuecomment-431851369, or mute the thread https://github.com/notifications/unsubscribe-auth/AfupNNBM9UB97eT1l1tcU59qxfiD0t-tks5undW-gaJpZM4Xv0rX .

-- Thank and Regards Aditya Rustagi

Bartzi commented 6 years ago

Yes I can have a look at some sample data, but you'll need to attach them :wink:

rustagiadi95 commented 6 years ago

Sorry for that...i mailed you the data that time....i was thinking...that whether can we train the recognition part of net individually? without the localisation net?

Bartzi commented 6 years ago

Oh you send me a mail with the data? I think I did not receive such a mail... Could you send it again? Of course you can train the recognition part without the localization part, but then your model will not be different from other recognition models. Or do I get you wrong?

ArtifIQ commented 6 years ago

You got me right. regarding the data, there is no need to disturb you with all the hassle of going through the data. I understand that that my model will not be different than any other model, but in my situation, I am already getting the localized images, not at the character level, but at the word level among the whole image. But i still think that I would need the localization part if I wanna get the individual characters within localized word. Anyways, i have some questions which i think i know the answer of but I wanna hear your answers on those questions... q1) How the LSTM network in the localization net, will be able to distinguish that whether it has detected the same character/word in the previous timesteps or not, coz it is important to choose number of timesteps one would think will be needed in the image? q2) Will the WHOLE model would work on char74k dataset?

Bartzi commented 6 years ago

Okay, let me try to answer your questions:

You cannot be entirely sure that the LSTM is able to distinguish that it already detected a character in a previous timestep, because there is no inhibition of return mechanism. We do know, however, that the LSTM is trained under a very harsh constraint. The loss for the whole network is the recognition loss of the network. In the case of locating and recognizing single characters from an already cropped text line, we explicitly tell the network to use another character if we train the system with SoftmaxCrossEntropy, if we use CTC Loss this constraint is not that harsh, the network only learns that it should span its localizations over the text region. So the number of timesteps is actually a hyperparameter. It depends on the languge you are dealing with. That's all I can tell your right now...
The whole model should work on the char74K dataset. If you use one timestep for the localization network and only predict one character, it should be able to zoom into a single character and maybe increase recognition accuracy, but I'm not sure it will make a huge difference.

Bartzi / stn-ocr

Training #25