Open phollott opened 2 years ago
I found a solution that works well enough:
These changes work for what I am trying to do, so far, although I would have preferred a solution that just has two models for sdt, but I was struggling to figure out how to get the model for cell to work. It's a work in progress, and sdt occurring in Word documents is a pain, but some use cases I have require them.
@phollott would you consider a PR with your extension?
@tuurma perhaps... I have some changes that may be useful for others during DOCX to TEI conversion. The project I have been working on involves things like conversion of subscript and superscript, working around Word SDT (which is a pain), and a number of other features, which I might be able to include in a pull request.
I am using a TEI Publisher application to upload and convert DOCX files, but when the source document contains structured document tags (which some of my source documents do), the text within the tags is missing in the TEI that is generated.
To reproduce:
If you upload the attached document into TEI Publisher, the text "TEST1" and "TEST2" is expected in the resulting TEI, but it is missing, because it is embedded within structured document document tags in Word, in a table cell and in a paragraph, respectively.
Thank you for any light you might be able to shed on this. I suspect this would be an additional conditional pathway in transform/docx-tei.xql to transclude any w:sdt elements in the document xml or something like that, but I have been unable to figure out how to make this work.
sdt-test.docx