When an attachment is uploaded, a service needs to be integrated into editor-ui that extracts the full text of the document. This could possibly be in the attachments tab, as a button when clicked generates the XML full text and saves it to the exist-db.
When an attachment is uploaded, a service needs to be integrated into editor-ui that extracts the full text of the document. This could possibly be in the attachments tab, as a button when clicked generates the XML full text and saves it to the exist-db.
For PDF we have https://github.com/gawati/pd2xml-service/tree/dev (the pdfminer branch has the python3.6 implementation ) .
For other types we will need to have an intermediate layer that allows plugging services for other generic types.