Open rustagiadi95 opened 6 years ago
Which dataset do you want to do experiments on? What did you try until now?
https://drive.google.com/open?id=16PwjdAR7UWrovgHNumuLE_9u6Q7uyVj9 https://drive.google.com/open?id=1gkORLYpovnIQ2FNSD6YfPhBYzmhqsMID
These are the two versions of the net you created on the stn-ocr paper. Thy are practically same. You can make open either of them. I am working on all of the datasets. Both the 32x32 without bonding box and only label dataset and the variable size multiple bounding box dataset. I have extracted the dataset out of the second one successfully too. Next I want to work upon the fsns dataset that you mentioned. I tried to train the net on the 32x32 svhn dataset and the training losses are not good. I understand it is the first dataset that this net has encountered, I have used only 20000 images of this dataset and with 5 epochs. The learning rate range(0.00001 - 0.0000005) and the optimizer(SGD) u asked to work with has not shown me the results till this point. I am just really curious that if I trained it on the full training dataset (~73K images) of this dataset. Will it improve? and if I am gonna work, how may epochs should I use? It is requiring a lot of computing power, that is why I am very cautious about this. Secondly, what should I do to make it almost completely accurate? I know these are a lot of questions, but I think ur research is really commendable and it should be get appreciation. Plz help out.
Hmm, looking at your code I can only say the following:
0.0001
or even 0.00001
32
should work quite nicelyAdam
instead of SGD
. Adam converges more quickly.BBoxPlotter
that I created (you can find it in the insights
folder). This tool lets you observe the progress of the training. It does so by using the network to do a prediction on a given image for each iteration of the training. This image is then saved to the hard disk, so you can inspect the state of the network at a given time step. With such a tool you can very quickly determine whether the network diverges or not. This is something you can not directly see from the loss values. So I highly recommend doing this!Hello Sir, The suggestions that you gave me seem to work well for me. The losses are getting reduced. I just want to check if the model will work on my data or not. I want to test it on my data. The training is a bit time consuming. So I want to have the pre-trained weights. I am attaching a sample of the data. Please look over them and reply if the model can work on detecting the text within these images or not, if it works please let me know if you can provide me the pretrained weights or not. Within the images, I want to detect the channel name and number
On Mon, Oct 22, 2018 at 7:57 PM Christian Bartz notifications@github.com wrote:
Hmm, looking at your code I can only say the following:
- try to use a lower learning rate like 0.0001 or even 0.00001
- increase your batch size! Will never work, because the network uses BatchNorm. A batch size of 32 should work quite nicely
- try to use Adam instead of SGD. Adam converges more quickly.
- try to create a similar tool like the BBoxPlotter that I created (you can find it in the insights folder). This tool lets you observe the progress of the training. It does so by using the network to do a prediction on a given image for each iteration of the training. This image is then saved to the hard disk, so you can inspect the state of the network at a given time step. With such a tool you can very quickly determine whether the network diverges or not. This is something you can not directly see from the loss values. So I highly recommend doing this!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Bartzi/stn-ocr/issues/25#issuecomment-431851369, or mute the thread https://github.com/notifications/unsubscribe-auth/AfupNNBM9UB97eT1l1tcU59qxfiD0t-tks5undW-gaJpZM4Xv0rX .
-- Thank and Regards Aditya Rustagi
Yes I can have a look at some sample data, but you'll need to attach them :wink:
Sorry for that...i mailed you the data that time....i was thinking...that whether can we train the recognition part of net individually? without the localisation net?
Oh you send me a mail with the data? I think I did not receive such a mail... Could you send it again? Of course you can train the recognition part without the localization part, but then your model will not be different from other recognition models. Or do I get you wrong?
You got me right. regarding the data, there is no need to disturb you with all the hassle of going through the data. I understand that that my model will not be different than any other model, but in my situation, I am already getting the localized images, not at the character level, but at the word level among the whole image. But i still think that I would need the localization part if I wanna get the individual characters within localized word. Anyways, i have some questions which i think i know the answer of but I wanna hear your answers on those questions... q1) How the LSTM network in the localization net, will be able to distinguish that whether it has detected the same character/word in the previous timesteps or not, coz it is important to choose number of timesteps one would think will be needed in the image? q2) Will the WHOLE model would work on char74k dataset?
Okay, let me try to answer your questions:
Can u tell me exact steps to train the model? with all the datasets and upto what extent it should be trained along with learning rates and all...plz help me put brother