Conal-Tuohy / VMCP-upconversion

Ferdinand von Mueller's correspondence upconversion from MS Word to TEI XML
Apache License 2.0
3 stars 2 forks source link

Horizontal alignment in lists #28

Closed LucasHorseshoeBend closed 7 years ago

LucasHorseshoeBend commented 7 years ago

As a simple example see http://vmcp.conaltuohy.com/xtf/view?docId=tei/1870-9/1872/72-09-26-final.xml [I can't see why this file is returned as the only invalid file now in the set of files: can you?] In the second tabular block Ern Ommanney should be aligned with F.R.G.S. I think the misalignment is caused by the footnote.

I could ensure alignment by having a three-row table, but we tried to improve appearance by minimising horitontal row margins.

Alignment is absolutely critical to meaning in many files. The best example of problems is http://vmcp.conaltuohy.com/xtf/view?docId=tei/1840-9/1840-4/44-00-00-final.xml The first table, below "Ostern 1847" shows the issue is not due solely to the insertion of footnotes under the paragraph containing the footnote reference number. Parmelia in column 1 should be aligned with Carex in colum 2. In this case I think I understand what caused the problem: when I was cleaning up the original---your extremeely useful diagnostics showed it was a complete mish-mash of styles and it turns out that there was a lot of manual adjustment to make it "look right"---I did not match the empty paragraphs in column 2 to the apropriate style in the first column. However, in the next table, either the footnotes or the differences in the width of the columns relative to the text in the source document and the XTF (or both) have caused the misalignments. Once again, having seen this display, I think I could fix at least part of the issue by forcing paragraps at line breaks, but that would not remedy the footnote problem, shown most clearly in the table under " d. 11. Oct. 1845. ", where the original had an entry in each column in each row, so that "Zostera minor" was in the same row as |26 | Ex | in | 8 | [...]

I know we are working in edit view, but it woulp help if I were able to get a good idea of what an end-user view would be. At the moment I do not have a feel for the final effect of our efforts to control displays, avoiding inserting lots of tabular rows which do not "feel right". But it looks as if to control alignment we may need to do this, or a differnt way of displaying footnotes be devised.

There are other alignment problems in http://vmcp.conaltuohy.com/xtf/view?docId=tei/1840-9/1840-4/44-00-00-final.xml, which I will need to try ot work out how to control, becasue tables are not an easy solution: see the list starting "Callitriche autumnalis. L." where the species names are aligned to the right of the relevant genus, controlled in the file by empty spaces, but which would not easily be replaced by tabs for the reasons you have pointed out before. Compare the source file with the XTF for the 9 species listed below "Potamogeton lucens. L." I need to think more about this.

Conal-Tuohy commented 7 years ago

I've had a look at the first document mentioned, 72-09-26-final.xml, and I agree the alignment is spoiled by the appearance of the footnote.

I could ensure alignment by having a three-row table, but we tried to improve appearance by minimising horitontal row margins.

I think what you've suggested is in fact the correct solution; to break each of the first and second cells in the table into three cells. i.e. each pair of name and "F.R.G.S." should be a row of its own. Clearly each of the names corresponds individually to one of the initialisms; the use of a table row would make that logical correspondence explicit (independently of how the table might be displayed). It's about recording the semantics of the table, rather than the purely visual aspect.

To give some pragmatic reasons; imagine the case that a visually impaired person is reading the text using a "screen reader". As the text is encoded now, their screen reader would read the table row by row, and within each row, it would read first one cell, then the next. The effect would be to read the table as a list of three names, followed by a list of the three initialisms. Whereas in fact the table makes more sense read as a list of three name/initialism pairs.

Also, in the hypothetical case that the document were rendered on a narrow display (on a mobile phone, for instance, or on a regular screen but blown up to a large font size, either by a visually impaired person or by someone presenting the letter to a class in a lecture theatre), there's a possibility that the longest row "(Signed) F. Mueller" would wrap into two lines (between the "F." and the "Mueller"), whereas the second column would not wrap (because it contains only a single, unbreakable, token). This would cause a misalignment again.

If formatting these texts as tables causes excessive vertical separation (i.e. too much white space between table rows), then this should I think be tackled by formatting the table manually to minimise the margins.

The table in http://vmcp.conaltuohy.com/xtf/view?docId=tei/1840-9/1840-4/44-00-00-final.xml is I think an instance of the exact same thing. The horizontal alignments in the original text are essential, as you say, to understanding the meaning, and hence it's necessary to capture those alignments explicitly, as tables. To reiterate, the point of a table is to capture not just the visual appearance of the text at that point, but also the logical intent of the author; not just the (visual) alignment of the items, but their (logical) correspondence.

The final text you mention is maybe a bit different. This is a list of species where the first species from a genus is given a binomial name, but the remaining species of that genus have only a white space (sometimes including a dash or ditto marks), followed by the specific epithet. In some cases (see Carex extensa) there's an authority that appears at the head of a list of species and is implied by a dash in subsequent lines.

I notice that the white space is not being displayed in XTF at all, and I think that's a bug; it's significant white space. The white space has certainly migrated successfully into the TEI:

<p rend="letter">Potamogeton lucens. L.</p>
<p rend="letter">
   <seg style="">
      <space unit="chars" quantity="21"/>
   </seg>
   <seg>β</seg>. minor. N.
</p>
<p rend="letter">
   <seg style="">
      <space unit="chars" quantity="21"/>
   </seg>
   <seg>γ.</seg> acuminatus N.
</p>

One could treat these as a two (or three) column table, too, I suppose? But I think if the white space were actually being displayed, it might be OK as it is, without further editing. Shall we review it once I've got the white space displayed?

LucasHorseshoeBend commented 7 years ago

Thanks; I understand your point about forgetting about display at thhis stage and concentrating on the logic. I'll work on files accordingly.

If we can get significant white space to display that would be brilliant.

At what stage will we develop a display version? It will need to be before we get the whole 15K files to final? We want to put up a significant chuck, i.e., the files at final, as soon as we can.

Conal-Tuohy commented 7 years ago

I've added another "issue" #29 for the question of the online publication itself. I'll leave this issue open until the white space display is sorted.

Conal-Tuohy commented 7 years ago

White space should now be displayed in XTF. The specific issue you pointed out, with the lists of species, looks pretty good to me, now. What do you think?

LucasHorseshoeBend commented 7 years ago

This is now working well and the relationships between one line and lines (or lines) below is preserved. Layout is not perfect in some other parts of http://vmcp.conaltuohy.com/xtf/view?docId=tei/1840-9/1840-4/44-00-00-final.xml, but we won't worry about that until I can see what it looks like in a user view.

Conal-Tuohy commented 7 years ago

OK I'll close this issue then and we can open a new one when and if it proves necessary.