Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
MIT License
377 stars 105 forks source link

Test accuracy #18

Open niddal-imam opened 5 years ago

niddal-imam commented 5 years ago

Hi,

I have 70 M training samples and 1 M validation samples. Test loss is reducing and accuracy has reached 0.83 but never exceeded 0.83. Now the number of epochs is 55, so should I wait or it will never get better?

mariembenslama commented 5 years ago

Hello, I guess that there's some corruption with the data (like the text written on the image is not complete because of the size of the background image)

eg: the ground truth is ABCDEFGH but the image has ABCDEF (length of the image doesn't allow the whole writing). So check ur data.

And could you tell me about the rest of the params, also I guess it'll rise up more later, it takes time after all if it's not done with generalizing all the characters.

niddal-imam commented 5 years ago

Hi Mariem, thank you for your comment. I followed Holmeyoung's method. stage 1, I did not change the params. I trained until the loss starts fluctuating. Stage 2, I changed the learning rate and so on. The loss reduced a little bit but the accuracy never exceeded 0.83. The training data is larg, and it is mix of (English and Arabic) and (synthetic and cropped images).

mariembenslama commented 5 years ago

I see, I guess when there'r cropped images, the accuracy doesn't add up (that's why it reached 83% and stopped there); this is just according to my experience unless holmeyoung has another saying in this.

niddal-imam commented 5 years ago

Right, probably the cropped images are the reason why it can improve. Thanks

mariembenslama commented 5 years ago

You're welcome, and btw can you tell me how long it took for the 70M images to be created? (Hours or days) and if it doesn't bother you : how much RAM you have? Thanks ^_^

niddal-imam commented 5 years ago

Actually I have 7M not 70M; I mistakenly typed 70M :) It takes me days because I am new to ML and DL. I reconstructed part of the English synth 90 K datasets and built Arabic synth datasets. Also, I cropped about 2 K images. I am using tensor-book with 32 GB DDR4 RAM.

mariembenslama commented 5 years ago

I see ~ Thank you very much :D

mariembenslama commented 5 years ago

Hello again Niddal-imam :))

Can you tell me how to buy a tensor book, and how much it costs?

Thanks.

niddal-imam commented 5 years ago

Hi Mariem,

Sure, you can buy it from Lambda's website, and the prices are different depends on the memory size. Please check this website for more details. https://lambdalabs.com/deep-learning/laptops/tensorbook

Also, this is a forum where you can see users' reviews. https://deeptalk.lambdalabs.com/t/lambda-tensorbook-specifications/388

If you have more question, do not hesitate to ask.

mariembenslama commented 5 years ago

Thanks for your answer,

However, I checked the website, it says that it costs (Max) about $3,355.00? How is that calculated?

niddal-imam commented 5 years ago

I bought the premium one and it cost my 3,181 US dollar. However, as it is going to be shipped from the US, you need to consider the duty and tax fees.

mariembenslama commented 5 years ago

Yes but 3,181 US dollar is too little?!? O.O" I mean it says that it's the before shipping price but what is the real price lol (sorry for any inconviences).

niddal-imam commented 5 years ago

:) As I live in the UK, It cost me extra 600 US dollar for duty and tax. Total price was about 4 thousand US dollar. Actually, the prices is not to little when compare it with other "Machine Learning" laptops, but the price is reasonable.

mariembenslama commented 4 years ago

Hello again :) I wanted to ask again if your work has reached a good result on real life images and if also, the rapidity of the tensor book has given you quick results? (Because I'm going to try some of the AWS instances so I wanted to ask you ^_^ ) Thanks.

niddal-imam commented 4 years ago

Hi,

Yes, I was able to improve the recognition from 16% to 46% CRW on test real-world images. I have trained the model with over 200k synthetic images. The more images I used, the better recognition I got. Regarding the rapidity of the tensor book, it took me about 3 days to train the model with 200k images for about 10 epochs. Although the tensor book is not cheap, it is better than renting virtual machine and pay per hour. It has saved me a lot of money. I highly recommend it :)

mariembenslama commented 4 years ago

I see, thank you very much for the answer. I hope I can afford it. However, my question is: is it able to recognize any image you give it now? and if we compare it to Google vision API, do you recommend google vision or this project?

mariembenslama commented 4 years ago

And by more images: do you mean we add more images of the same sample (same composition/shape/representation) or varient images (more diffferent than the synthetic images)?

niddal-imam commented 4 years ago

Yes, it can recognize any real-world images with 57% accuracy for English and 46% for Arabic. Actually I have not tried Google vision API, but I will compare my results with Google vision API. Thank you for the suggestion; I have been looking for a model to compare my results with.

For training, I first generated 100k synthetic samples, and after training the model with synthetic samples only, the model could not recognize real-world images accurately. Then, I mixed 100k synthetic and 3000 real-world samples for training. The result was better ~ 20% CRW. Finally, I used 200k synthetic and 3000 real-world samples and got 46% CRW. The synthetic images are samples with different text fonts, background, text sizes.

I hope that I answer your questions.

mariembenslama commented 4 years ago

Okay, thanks a lot. I did the text-image generation (synthetic) used in this project (with about 5 different fonts and 5 different text polices but with a very big japanese text file for about +7000 characters). I created 10Million for train and 1Million for test. I'm going to do the training but - lol after what you said I'm wondering if it's enough (the image composition). P.S: I'm going to use it not on real world images but on something that is gray and similar to the text-images generated in this project.

What do you think?

niddal-imam commented 4 years ago

In my case, when training the model on synthetic samples, the model recognizes synthetic test samples with ~ 80% accuracy. So, I think your model can achieve even better accuracy as you are using big training dataset.

Good luck.

mariembenslama commented 4 years ago

Yes but the composition isn't very varient so I'm worried probably it won't recognize anything I give it to it later (disregarding the accuracy of the model I mean).

niddal-imam commented 4 years ago

What do you mean by composition? Do you mean words embedded in the generated images?

mariembenslama commented 4 years ago

Yes, by composition I mean for example I create images with 10 different backgrounds instead of 5 different bakgrounds which mean the model will learn different background features. That's the variety of the data. = is what I mean by composition. Also different text polices (embedded in the image), different rotations, etc.

niddal-imam commented 4 years ago

Right, I think the model does not learn background's features, it just learn embedded text labeled. absolute/path/to/image/xxx.jpg label of xxx.jpg This is my understanding, and Holmeyoung can correct me if I am wrong. So, using 5 or more background does not help that much, but using more labels (words) helps. In my case, I used different corpus and dictionaries to generate text.

mariembenslama commented 4 years ago

I see, for me I used a text file that has a huge Japanese and English text corpus (adresses, cities, names, dates, ...etc): its size is 60Mo (the file) and I generated images with random text from this file (about 10 characters per image) - I created 10Millions bcuz according to Holmeyoung it should be nb_characters * 1000 images = 7Millions but I created 10 millions for train and then 1M for test.

niddal-imam commented 4 years ago

Good, I think you can get good recognition accuracy. 60M text with different fonts will definitlly help.

Good luck.