gawati / gawati-editor-ui

Gawati Client
GNU Affero General Public License v3.0
0 stars 0 forks source link

Service integration to extract full text #37

Open kohsah opened 6 years ago

kohsah commented 6 years ago

When an attachment is uploaded, a service needs to be integrated into editor-ui that extracts the full text of the document. This could possibly be in the attachments tab, as a button when clicked generates the XML full text and saves it to the exist-db.

For PDF we have https://github.com/gawati/pd2xml-service/tree/dev (the pdfminer branch has the python3.6 implementation ) .

For other types we will need to have an intermediate layer that allows plugging services for other generic types.