TEIC / Stylesheets

TEI XSL Stylesheets
235 stars 125 forks source link

A w:t in a specific location in a Docx file disappears in the conversion #99

Closed jure closed 9 years ago

jure commented 9 years ago

Having a hard time debugging this one. It's basically a bit like this:

    <w:p w:rsidR="007C794E" w:rsidRPr="007C794E" w:rsidRDefault="007C794E" w:rsidP="007C794E">
      <w:pPr>
        <w:pStyle w:val="EndNoteBibliography"/>
        <w:spacing w:after="0" w:line="360" w:lineRule="auto"/>
        <w:ind w:left="720" w:hanging="720"/>
        <w:rPr>
          <w:sz w:val="24"/>
        </w:rPr>
      </w:pPr>
      <w:r w:rsidRPr="007C794E">
        <w:rPr>
          <w:rFonts w:cs="Arial"/>
          <w:sz w:val="24"/>
          <w:szCs w:val="24"/>
        </w:rPr>
        <w:fldChar w:fldCharType="begin"/>
      </w:r>
      <w:r w:rsidR="000E1C70" w:rsidRPr="001A62B7">
        <w:rPr>
          <w:rFonts w:cs="Arial"/>
          <w:sz w:val="24"/>
          <w:szCs w:val="24"/>
        </w:rPr>
        <w:instrText xml:space="preserve"> ADDIN EN.REFLIST </w:instrText>
      </w:r>
      <w:r w:rsidRPr="007C794E">
        <w:rPr>
          <w:rFonts w:cs="Arial"/>
          <w:sz w:val="24"/>
          <w:szCs w:val="24"/>
        </w:rPr>
        <w:fldChar w:fldCharType="separate"/>
      </w:r>
      <w:r w:rsidRPr="007C794E">
        <w:rPr>
          <w:sz w:val="24"/>
        </w:rPr>
        <w:t>1. Stenberg P, Larsson J (2011) Buffering and the evolution of chromosome-wide gene regulation. Chromosoma 120: 213-225.</w:t>
      </w:r>
    </w:p>

The text in w:t "1. Stenberg P, ..." disappears already from the TEI, the result for this paragraph is:

<p rend="EndNote Bibliography"><?biblio ADDIN EN.REFLIST?></p>

I know it has to do with the instruction processing, because of those fldChar elements, but I haven't really figured out what the cause is. I'm working on solving it as we speak, but any pointers are more than welcome!

jure commented 9 years ago

It's looking more and more like this

https://github.com/TEIC/Stylesheets/blob/master/docx/from/paragraphs.xsl#L284

is at the root of the issue. current-group()in this case is all of the elements within the w:p, including the w:t that then goes missing. All of these elements get inserted into the ref element, and disappear upon further processing.

jure commented 9 years ago

A very similar fragment produces the correct output with the reference in place, so I'm even more confused.

This is the example that converts without issues:

    <w:p w:rsidR="00A5736B" w:rsidRPr="00A5736B" w:rsidRDefault="00876740" w:rsidP="00A5736B">
      <w:pPr>
        <w:pStyle w:val="EndNoteBibliography"/>
        <w:ind w:left="720" w:hanging="720"/>
        <w:rPr>
          <w:noProof/>
        </w:rPr>
      </w:pPr>
      <w:r w:rsidRPr="008700C0">
        <w:rPr>
          <w:rFonts w:ascii="Cambria" w:hAnsi="Cambria"/>
          <w:color w:val="auto"/>
        </w:rPr>
        <w:fldChar w:fldCharType="begin"/>
      </w:r>
      <w:r w:rsidR="00B54304" w:rsidRPr="008700C0">
        <w:rPr>
          <w:rFonts w:ascii="Cambria" w:hAnsi="Cambria"/>
          <w:color w:val="auto"/>
        </w:rPr>
        <w:instrText xml:space="preserve"> ADDIN EN.REFLIST </w:instrText>
      </w:r>
      <w:r w:rsidRPr="008700C0">
        <w:rPr>
          <w:rFonts w:ascii="Cambria" w:hAnsi="Cambria"/>
          <w:color w:val="auto"/>
        </w:rPr>
        <w:fldChar w:fldCharType="separate"/>
      </w:r>
      <w:r w:rsidR="00A5736B" w:rsidRPr="00A5736B">
        <w:rPr>
          <w:noProof/>
        </w:rPr>
        <w:t>1. Burleigh JG, Alphonse K, Alverson AJ, Bik HM, Blank C, et al. (2013) Next-generation phenomics for the Tree of Life. PLoS Currents 5.</w:t>
      </w:r>
    </w:p>

It's the same kind of document fragment, with the "1. Burleigh JG..." reference being part of the current-group() mentioned above. Just that in this case the result is different, i.e. the reference is not missing. I'm digging deeper. Apologies for the play by play, hopefully the next update will be a solution.

jure commented 9 years ago

I found the culprit, now I need to find a way to fix it. If there's a size element in the w:rPr, e.g.:

      <w:r w:rsidR="00A5736B" w:rsidRPr="00A5736B">
        <w:rPr>
          <w:noProof/>
          <w:sz w:val="24"/>
        </w:rPr>

        <w:t>1. Burleigh JG, Alphonse K, Alverson AJ, Bik HM, Blank C, et al. (2013) Next-generation phenomics for the Tree of Life. PLoS Currents 5.</w:t>
      </w:r>

Then the w:t disappears. Remove the w:sz, and the text reappears. Somewhere there's a selector that discriminates based on the presence of w:sz, and it goes wrong in this case.

Fix coming up.

Quick update: A temporary fix is to set preserveEffects to false.

sebastianrahtz commented 9 years ago

you are having fun :-}

I'll wait a day or two to see if you find a fix.

On 21 April 2015 at 16:11, Jure Triglav notifications@github.com wrote:

I found the culprit, now I need to find a way to fix it. If there's a size element in the w:rPr, e.g.:

  <w:r w:rsidR="00A5736B" w:rsidRPr="00A5736B">
    <w:rPr>
      <w:noProof/>
      <w:sz w:val="24"/>
    </w:rPr>

    <w:t>1. Burleigh JG, Alphonse K, Alverson AJ, Bik HM, Blank C, et al. (2013) Next-generation phenomics for the Tree of Life. PLoS Currents 5.</w:t>
  </w:r>

Then the w:t disappears. Remove the w:sz, and the text reappears. Somewhere there's a selector that discriminates based on the presence of w:sz, and it goes wrong in this case.

Fix coming up.

— Reply to this email directly or view it on GitHub https://github.com/TEIC/Stylesheets/issues/99#issuecomment-94834075.

Sebastian Rahtz

Director (Research) of Academic IT

University of Oxford IT Services

13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Não sou nada.

Nunca serei nada.

Não posso querer ser nada.

À parte isso, tenho em mim todos os sonhos do mundo.

sebastianrahtz commented 9 years ago

have you managed to fix this, or should I try to solve it?

jure commented 9 years ago

So I went with the temporary (nothing more permanent than a temporary solution) preserveEffects set to false, because that's what I want anyway. I haven't discovered the original flaw though.

The example in the original post should be enough to replicate it within a Docx document, if you were so inclined.

sebastianrahtz commented 9 years ago

Using the original example, I can reproduce the loss of the text; but removing w:sz makes no difference. And preserveEffects is set to false by default anyway....

However, I can see a flaw in pass2.xsl which prevented the text from ever appearing. with some undocumented logic I dont understand. sigh.