chembl / curation-interface

This is repository to track all bugs and issues related to development of the curation interface
0 stars 0 forks source link

Include server-side image segmentation. #100

Open mnowotka opened 9 years ago

mnowotka commented 9 years ago

Currently PDF viewer in curation interface can extract images from PDF. But sometimes the PDF itself is just a one big image of scanned document. The same is with images - they can contain only a structure but also text, charts, etc.

In such cases there should be an option to send the whole document to the server and perform image segmentation there. Recognized segments should be marked on the original image on the client side and then user can choose one or more segments to send them to server once again but this time to perform optical structure recognition.

Then we will be able to use a combination of two software packages: OSRA and Indigo. Without segmentation, using Indigo has less sense as it can't handle images with text and other stuff.