OpenPecha / extract-opf-glyphs

MIT License
0 stars 0 forks source link

OCR0052: Automating Glyph Extraction from OPF eText #1

Open 10kalden opened 3 weeks ago

10kalden commented 3 weeks ago

Description:

Implementation:

The approach for this has already been tried: indexing the text and cropping the character by expanding its bounding poly. However, this was not optimum, and the results were not good.

Image

Estimation: Initial Estimation: 4 days start date: 23-08-24 end date: 28-08-24

Updated Estimation: 3 days start date: 03-09-24 end date: 05-09-24

Sub-task:

Completion Criteria: To extract all the glyphs without the need for manual annotation

10kalden commented 2 weeks ago

For character ཅ, in the 4th volume of Kangyur

img.1 Original Image

Image

img.2 the character found in 2nd quadrant

Image

img.3 the character found in 4th quadrant

Image

10kalden commented 1 week ago
10kalden commented 1 week ago

img.1 original image Image

img.2 cropped image for ཅ_1 Image

img.3 cropped image for ཅ_2

Image

img.3 cropped image for ཅ_3 Image

10kalden commented 1 week ago

Issue.1 The border around the images is causing problems with correct image segmentation.

Image

Possible Solution

  1. Tesseract to detect text area
  2. openCV
  3. Cropping the image from with a fixed value
kaldan007 commented 1 week ago

test with our layout analysis model

10kalden commented 5 days ago

Image

img. ཅ_1 Image

img.ཅ_2

Image

img.ཅ_3

Image

img.ཅ_4

Image