TEIC / Stylesheets

TEI XSL Stylesheets
235 stars 125 forks source link

.docx to TEI P5 XML Document conversion fails #405

Open fricke-steyer opened 4 years ago

fricke-steyer commented 4 years ago

Can you help me? Our other files are ok, only this one doesn't work. Whats wrong? Kind regards, Henrike

emotion_analysis_2019.docx

Error occured. Please check the filetype and try again.?

Error: class pl.psnc.dl.ege.exception.ConverterException

Processing terminated by xsl:message at line 130 in fields.xsl

peterstadler commented 4 years ago

I did a little debugging and the error I get (from running on the command line) is

 fldSimple: unrecognized type REF BMfig_wheel \* MERGEFORMAT 

This originates from the word file here:

<w:fldSimple w:instr="REF BMfig_wheel \* MERGEFORMAT ">
    <w:r w:rsidRPr="005B4B5A">
        <w:rPr>
            <w:rStyle w:val="AbbVerweiszfdgZchn"/>
        </w:rPr>
        <w:t>1</w:t>
    </w:r>
</w:fldSimple>

-- which is the "1" reference in "The wheel (Figure 1) is constructed …"

I'm no docx expert, so I do not know which (arcane) feature this is and how to treat it right. Hence, I'd like to close it here and move it to the Stylesheets issues if anyone thinks we should follow up on this?!

lb42 commented 4 years ago

Running this online with docxtotei produces the (slightly) more helpful error message:

 [xslt] fldSimple: unrecognized type REF BMfig_wheel * MERGEFORMAT

which appears to relate to the reference to a graphic in section 2.2 :

"The wheel (Figure 1) is constructed in the fashion of a color wheel"

I don't have Word here, so I cannot be sure. However, if I delete that parenthesized reference, save the file as DOCX, and try the conversion again, everything works fine.

Maybe the problem is that the graphic file isn't included in the document?

On 11/12/2019 15:00, fricke-steyer wrote:

Can you help me? Our other files are ok, only this one doesn't work. Whats wrong? Kind regards, Henrike

emotion_analysis_2019.docx https://github.com/TEIC/oxgarage/files/3863581/emotion_analysis_2019.docx

Error occured. Please check the filetype and try again.?

Error: class pl.psnc.dl.ege.exception.ConverterException

Processing terminated by xsl:message at line 130 in fields.xsl

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TEIC/Stylesheets/issues/405?email_source=notifications&email_token=AAFBJ5HW4A3Y7KHTRFWIOWDQYD57TA5CNFSM4JZQLPS2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H7ZHFUA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBJ5E6VSSECMBXVBDHIJLQYD57TANCNFSM4JZQLPSQ.

TomazErjavec commented 4 years ago

Rather than opening a new issue, I post here another Word file that causes the Stylesheets to fail. At first glance it looks easier to fix than the previous one, the error is:

A sequence of more than one item is not allowed as the first argument of fn:starts-with() ("VAROVALKE_1_brez ozadja copy", "VAROVALKE_2_brez ozadja copy") ; SystemID: file:/project/tei/convert/Stylesheets/docx/from/graphics.xsl; Line#: 83; Column#: 12

TEI_Stylesheet_crash-test.docx