DocxToJats is a PHP library that converts DOCX archives that comply OOXML standards into JATS XML (Journal Article Tag Suite) format. It's tested with DOCX produced by LibreOffice, MS Word, and Google Docs.
git clone https://github.com/Vitaliy-1/docxToJats.git
cd docxToJats
php docxtojats.php [/path/to/input/file.docx or /path/to/input/dir/] [/path/to/output/file.xml or /path/to/output/dir]
. E.g., to process a single file: php docxtojats.php /mydir/file.docx /mydir/converted/file.xml
- if output filename is pointed, attached files, like figures, will be moved into the same folder; to process multiple files in a folder by relative path: samples/input/ samples/output/
.
DocxToJats is used as a submodule to the DOCX Converter Plugin, written for Open Journal Systems. Unfortunately DOCX archive doesn't contain much metadata and JATS front
elements remain not populated, thus, the best way would be to integrate docxToJats with editorial manager from where article's metadata can be retrieved. DOCX Converter Plugin is such an example.