Splitting the image in blocs, paragraphs, lines and chars.

C-Text / CText

OCR school project. OCR EPITA goes wrong

1 stars 0 forks source link

Splitting the image in blocs, paragraphs, lines and chars. #6

Open Vinetos opened 3 years ago

Vinetos commented 3 years ago

For each image, after being processed, images must be split in blocs containing the text.

Inside blocs, we must detect:

[x] paragraphs
[x] lines
[x] words
[x] characters

These little matrix will be then sent to the neural network.

starcruiser5289 commented 3 years ago

need more research for paragraphs segmentation. but code works on basic document such as word(it just does not recognise paragraphs) all code will be updated to return a text once linked to neural network