clemsciences / ohcr

Image processing to isolate hand-written characters in an image and to learn recognize them.
MIT License
2 stars 0 forks source link

documentation on how to use #1

Closed ghost closed 6 years ago

ghost commented 6 years ago

@clemsciences how can your code be used? also can it extract lines?

clemsciences commented 6 years ago

Hey @christophered, I'm refactoring the code in order to:

Now, the code does not extract lines. It only extracts characters, but with a small modification, it could guess the lines where characters are.

clemsciences commented 6 years ago

So, it's in English, and I'm writing a tutorial to find lines in images.

ghost commented 6 years ago

@clemsciences thanks

clemsciences commented 6 years ago

What are actually your needs?

ghost commented 6 years ago

I'm just testing various text line extraction methods & tools that are available, to see which is more suitable to extract text lines from the scanned data I have. Later-on, the extracted lines will be used to train a new ocr model, either in Tesseract, ocropy or kraken. Currently, Ocropus3 got my interest, it uses deep learning to conduct page layout analysis & segmentation. So, I think I found what I needed already. Thank you

clemsciences commented 6 years ago

Ok! Well this repository is made for fun, so you won't get state-of-the-art algorithms.

At least your interest for this domain has made me work on this and it's good.

ghost commented 6 years ago

@clemsciences Thank you for your hard work!

ghost commented 6 years ago

I have tested the seam carving and it seems it's strength is in Handwriting line segmentation, while having some weakness in printed documents. Thanks again.