Closed kurt-stolle closed 3 years ago
Hello @kurt-stolle Can you share the structure of the files / folders of textgen and translations? Or if it is publicly available, where can I find it?
Hi @Ameya-Manas, textgen
and translations
are packages that help generate label-image pairs via torch.utils.data.IterableDataset
in the target style of the OCR-application with some augmentations applied (distortions, warps, etc.). Sadly, company policy does not allow me to share these files here.
@kurt-stolle Ok no problem. Thank you anyway. :) Your code has still given me some ideas as to how to proceed. :)
"Training pipeline for recognition part is a modified version from this repository. [https://github.com/clovaai/deep-text-recognition-benchmark]" citing readme from https://github.com/JaidedAI/EasyOCR/tree/a5d3053df952ccc411863e1b0d690f3678c9da03
@piotr-ost Indeed, the code in this issue is too. What is your point?
Hey @kurt-stolle. There is a guide on training for your own dataset as well as failure cases under that repository. In the paper they also outline some possible solutions to overcome the failure cases like say low-res images. I faced a similar problem as you and found it helpful so thought it might be worth sharing :)
Hi to everyone . I wish to get my hand dirty and try to re-train(more likely transfer learning) easyOCR to improve its performance in my dataset. Where exactly is the guide you mention @piotr-ost ?. In my case, what I have are low res images, easyocr performs really well in general. But I want to try improve it depending on each roi extracted. Any help you can dumb down for me will be really appreciated. My first step is to try understand the code given by @kurt-stolle since I have never used pytorch. Thanks in advance !
Hi there, @LanzaMercado all the training steps are outlined in https://github.com/clovaai/deep-text-recognition-benchmark, if you are struggling with low resolution images doing superresolution might be the way to improve accuracy. Personally, using the RDN_x4 model from the repository https://github.com/yjn870/RDN-pytorch made the predictions better by a bit. Good luck!
@LanzaMercado Sadly, this issue was never resolved. I ended up writing my own OCR library with off-the-shelf networks, using a similar three-stage approach as EasyOCR does.
In my specific use-case, I only need to recognize texts of which the font is always the same and the background is always darker than the foreground. Additionally, the character set is smaller than the one that the default
latin.pth
checkpoint is trained on. The default model does not meet my accuracy requirements, even after fine-tuning each parameter. For this reason, a way to improve accuracy would be to fine-tune the model using generated image-text pairs in the target style of my application.My current solution loads the default
latin1.pth
model, but instead uses my own prediction layer that reflects my reduced character-set. Transfer-learning is then performed by only training the weights of this final layer. The training process is set-up as follows:While this already significantly improves accuracy, I would like to go further and also train the remaining layers. I notice though that when I try to train all layers simultaneously, the model quickly diverges. It is not clear to me whether this is due to a mistake in the training script or something else that I am not accounting for.
How can I (re-)train the model either from scratch or using
latin.pth
as a starting point?