apsexton / bateman-ocr

Tools and experiments in the OCR of the Bateman Manuscripts
ISC License
0 stars 5 forks source link

Reconstruct image from connected components #8

Open apsexton opened 8 years ago

apsexton commented 8 years ago

data/sample.tif is a TIFF image file that contains 5 images. The data/sample_css folder contains moments.csv and one folder for each different image in sample.tif. For this task we are only concerned with the first image. Later tasks will extend the system to handle different images.

moments.csv contains one line for each connected component (CC) of the images in sample.tif.

The subfolders of sample_css contain TIFF images for each connected component identified in moments.csv. These images don't show up well in most image viewers because, while their foreground colour is black, their background colour is 100% transparent instead of the more usual white. You can see this clearly if you use the GIMP to open these image files.

When an image is loaded, the toggle action to switch between the loaded and generated image should become available (issue #7). When it is triggered, draw the first page (i.e. the first image), not by drawing the image from sample.tif, but instead by loading and drawing the individual CC images from that page in their correct positions. The end result should be identical to drawing the first image from sample.tif.

To do this:

kevinshen100 commented 8 years ago

I have completed this. Note: loading all the tiff components at the beginning takes quite a while (maybe 10 seconds), but after that it goes much faster (<1 second).

Also, I don't have a real icon for the toggle button, right now I just copied the exit button.

kevinshen100 commented 8 years ago

By the way, when the CSV file is read, do you want all the information to be read in, or just the needed information?

apsexton commented 8 years ago

Might as well read all the information in: there are ways we can use it in the future, and if we have it all in, it will allow us more easily to write out the full information with the extra ground truth labels as we ground truth it.