DJVUpp / Desktop

Java desktop client
GNU General Public License v3.0
3 stars 2 forks source link

Optimize OCR module specially with Arabic #15

Closed samehissam closed 8 years ago

samehissam commented 8 years ago

1- Comment code using java doc. 2- What is the main problems that take much time in execution. 3- Try to solve this problems with available code. 4- If no change in performance change the available code with other that achieve your goal. 5-Try to solve the problem of the text font that you want to get from the image as it works good with text font more that 10.

Mahmoud-zahran commented 8 years ago

p-picture of demo captura

Mahmoud-zahran commented 8 years ago

OCR Arabic :- In the demo above, only one point-cloud template is loaded for each of the 16 gesture types. we can add additional templates as we wish, and even define our own custom gesture templates.

the algorithms used in this demo ( a straightforward algorithm called $P that represents and recognizes stroke gestures as point clouds)

Mahmoud-zahran commented 8 years ago

steps to solution.

when trying to recognize letter using picture to be ensure the quality of algorithm of detecting .

hint:- point in the algorithm consist of (x,y,stroke id) .

  1. get all pixel from bitmap image using width and height .
    • [x] get the pixels using 2 loops (x of pixel width * y of pixel height) .
  2. get color of each pixel to can detect the letter shape .
    • [x] get the color of pixel using getpixel(x of pixel width ,y of pixel height) .
  3. detect the letter shape from background .
    • [x] convert image to monochrome (0,1) image to can
    • [x] get the point(x,y) of each white pixel or black .
Mahmoud-zahran commented 8 years ago

steps to solution.

second step

algorithm convert any image (the letter points) to XML representation.

  1. In the $p Algorithm we have a class GestureIO has two method ReadGesture() and WriteGesture() .

2.write images of letters as a Gesture letter by letter.

Notes

Mahmoud-zahran commented 8 years ago

the white points on the letter detected as a background and detect the letter like this . out

Mahmoud-zahran commented 8 years ago

OCR problems.

Segmentation.

Mahmoud-zahran commented 8 years ago

Solution Add all small Arabic words can be segmented as (connected word) for our XML dictionary and the classifier can distinguish between words and choose the best one.

classifier power.

final test final test