Conal-Tuohy / VMCP-upconversion

Ferdinand von Mueller's correspondence upconversion from MS Word to TEI XML
Apache License 2.0
3 stars 2 forks source link

Odd behaviopur in footnotes in some files #54

Closed LucasHorseshoeBend closed 1 year ago

LucasHorseshoeBend commented 2 years ago

I have come across some files where the footnotes display in the XTF with the number 0 (zero) for all notes, in both footnote reference number in the text and in the footnote panes.

Two examples:

60.12.12a = http://vmcp.conaltuohy.com/xtf/view?docId=tei/Mueller letters/1860-9/1860/60-12-12a-draft.xml

M70.12.07 = http://vmcp.conaltuohy.com/xtf/view?docId=tei/Mueller letters/Mentions/Selected Mentions letters/M70-12-07-draft.xml

I had a look at the raw TEI XML for the second one, but can't see anything that jumps out at my inexpert eye. I also tried reinserting the notes, but to no avail.

I have no idea how many such files there are, nor how to look for them.

Any ideas for a fix?

Arthur

LucasHorseshoeBend commented 2 years ago

Another case today, but this one is different. The first note is 1, and shows as 1 in the footnotes pane, but the second is showing as 0 (zero) in the text and in the pane.

69.12.18 http://vmcp.conaltuohy.com/xtf/view?docId=tei/Mueller letters/Mentions/1860-9/69-12-18.xml Arthur

Conal-Tuohy commented 2 years ago

The behaviour is odd.

I've done some preliminary investigation; I looked at the first one and the TEI XML does indeed show the footnotes are numbered 0 rather than sequentially (i.e. the n="0" in the first line of the XML extract below).

<note xml:id="ftn2" type="footnote" n="0">
   <p rend="Footnote" 
      style="font-family: Geneva; margin-left: 0.3335in; margin-right: 0in; text-indent: -0.3335in; font-size: 9pt; " 
      xml:lang="en">
      <seg style="font-weight: bold; ">
         Victorian Exhibition 1861 held in Melbourne prior to London 1862. M was a commissioner.
      </seg>
   </p>
</note>

Tracing it backwards, when I download the OpenOffice document from which the TEI was derived, and open the file in OpenOffice, I see the footnotes are numbered sequentially, i.e. the OpenOffice document is a faithful conversion of the original MS Word file.

When I extract the XML content out of the OpenOffice file, I find it contains the content below:

<text:note text:id="ftn2" text:note-class="footnote">
   <text:note-citation>2</text:note-citation>
   <text:note-body>
      <text:p text:style-name="P5">Victorian Exhibition 1861 held in Melbourne prior to London 1862. M was a commissioner./text:p>
   </text:note-body>
</text:note>

The <text:note-citation> element contains the number 2, as you'd hope, rather than a 0.

So the conversion failure is indeed in the portion of the processing pipeline which converts the OpenOffice XML to TEI XML.

I'm going to need to debug this further; on the face of it I can't see why this wouldn't be working but in any case I do think that the pipeline should be renumbering all the footnotes anyway, in order to handle footnotes in tables (remember this issue?)

LucasHorseshoeBend commented 2 years ago

Thanks Sometimes I think when things like this happen that there really are gremlins!!

Tables in footnotes? Issue 47

I had been busy doing other things and forgotten that I was going to look at this using the extracted files. I'll add it to my list of things to do soon so I don't bypass it again. I will need to get my head around the Word bookmark feature which I ahve never used.

Conal-Tuohy commented 2 years ago

Digging further into this, I found that the ODT file produced by the OpenOffice converter can in fact produce footnotes whose marker is 0. I enabled a fallback which is to label the footnote using the numeric suffix of the text:id attribute of the text:note element itself, e.g. converting ftn2 to 2.

But this will be solved imminently, by automatically renumbering of all footnotes.