DCLP / dclpxsltbox

Sandbox for development, testing, and review of XSLT for DCLP
http://dclp.github.io/dclpxsltbox/
1 stars 5 forks source link

spiritus lenis falsely displayed when combined with circumflex and underdot #284

Closed rla2118 closed 7 years ago

rla2118 commented 7 years ago

This issue concerns http://www.litpap.info/dclp/65795.

At the beginning of col. 9, line 1, there is a display problem where circumflex+spiritus lenis is combined with an underdot ε̣ῖ̣̣̓ν̣α̣ This spiritus (᾿) should appear below the circumflex, = ε̣ἶ̣ν̣α̣ι̣

wsalesky commented 7 years ago

@rla2118 @paregorios The problematic display is due to extra string processing applied to text in t:unclear

See line: https://github.com/DCLP/idp.data/blob/master/DCLP/66/65795.xml#L221

In the HTML: http://www.litpap.info/dclp/65795 The first word in the first line in column 9 displays incorrectly: ε̣ῖ̣̣̓ν̣α̣ and the first word in the second line displays correctly: εἶναι

The only difference is that the first instance is tagged as unclear. Which uses the normalize-space() and normalize-unicode() functions. Can I get some elucidation on the reasons for using normalize-unicode() on t:unclear? (Side note, it also seems to strip the last character from the above string.)

paregorios commented 7 years ago

@wsalesky I assume the call to normalize-unicode is intended to prevent the sort of mess that we're seeing here. One would have thought that NFC (which is what we're supposed to get from normalize-unicode) wouldn't do what @rla2118 is seeing. So, either our expectations of the NFC spec are incorrect, or there's a bug in saxon's implementation of the function. I'm going to try taking that out entirely and seeing how many files change.

paregorios commented 7 years ago

120 changes

paregorios commented 7 years ago

just a few spot checks seem to show that this improves things, but I would like to look at several more before we go this route. Here's a list of all the files whose transforms change if we eliminate the call to normalize-unicode when processing unclear:

    modified:   output/dclp/113/112358.html
    modified:   output/dclp/113/112370.html
    modified:   output/dclp/114/113269.html
    modified:   output/dclp/119/118692.html
    modified:   output/dclp/120/119281.html
    modified:   output/dclp/120/119313.html
    modified:   output/dclp/120/119314.html
    modified:   output/dclp/120/119316.html
    modified:   output/dclp/120/119318.html
    modified:   output/dclp/129/128954.html
    modified:   output/dclp/129/128955.html
    modified:   output/dclp/129/128957.html
    modified:   output/dclp/129/128965.html
    modified:   output/dclp/129/128973.html
    modified:   output/dclp/129/128974.html
    modified:   output/dclp/129/128975.html
    modified:   output/dclp/140/139884.html
    modified:   output/dclp/141/140285.html
    modified:   output/dclp/141/140289.html
    modified:   output/dclp/141/140292.html
    modified:   output/dclp/155/154389.html
    modified:   output/dclp/176/175280.html
    modified:   output/dclp/244/243963.html
    modified:   output/dclp/371/370035.html
    modified:   output/dclp/371/370041.html
    modified:   output/dclp/59/58911.html
    modified:   output/dclp/60/59099.html
    modified:   output/dclp/60/59112.html
    modified:   output/dclp/60/59446.html
    modified:   output/dclp/60/59450.html
    modified:   output/dclp/60/59467.html
    modified:   output/dclp/60/59468.html
    modified:   output/dclp/60/59519.html
    modified:   output/dclp/60/59615.html
    modified:   output/dclp/60/59697.html
    modified:   output/dclp/60/59744.html
    modified:   output/dclp/60/59749.html
    modified:   output/dclp/60/59751.html
    modified:   output/dclp/60/59752.html
    modified:   output/dclp/60/59753.html
    modified:   output/dclp/60/59754.html
    modified:   output/dclp/60/59757.html
    modified:   output/dclp/60/59758.html
    modified:   output/dclp/60/59759.html
    modified:   output/dclp/60/59760.html
    modified:   output/dclp/60/59762.html
    modified:   output/dclp/60/59838.html
    modified:   output/dclp/60/59867.html
    modified:   output/dclp/60/59960.html
    modified:   output/dclp/60/59962.html
    modified:   output/dclp/60/59969.html
    modified:   output/dclp/61/60048.html
    modified:   output/dclp/61/60170.html
    modified:   output/dclp/61/60180.html
    modified:   output/dclp/61/60189.html
    modified:   output/dclp/61/60191.html
    modified:   output/dclp/61/60192.html
    modified:   output/dclp/61/60335.html
    modified:   output/dclp/61/60761.html
    modified:   output/dclp/62/61353.html
    modified:   output/dclp/63/62382.html
    modified:   output/dclp/63/62386.html
    modified:   output/dclp/63/62387.html
    modified:   output/dclp/63/62390.html
    modified:   output/dclp/63/62391.html
    modified:   output/dclp/63/62400.html
    modified:   output/dclp/63/62416.html
    modified:   output/dclp/63/62419.html
    modified:   output/dclp/63/62433.html
    modified:   output/dclp/63/62437.html
    modified:   output/dclp/63/62439.html
    modified:   output/dclp/63/62441.html
    modified:   output/dclp/63/62444.html
    modified:   output/dclp/63/62445.html
    modified:   output/dclp/63/62448.html
    modified:   output/dclp/63/62450.html
    modified:   output/dclp/63/62460.html
    modified:   output/dclp/63/62463.html
    modified:   output/dclp/63/62471.html
    modified:   output/dclp/63/62476.html
    modified:   output/dclp/63/62477.html
    modified:   output/dclp/63/62478.html
    modified:   output/dclp/63/62479.html
    modified:   output/dclp/63/62498.html
    modified:   output/dclp/63/62499.html
    modified:   output/dclp/63/62500.html
    modified:   output/dclp/63/62511.html
    modified:   output/dclp/63/62580.html
    modified:   output/dclp/64/63072.html
    modified:   output/dclp/64/63332.html
    modified:   output/dclp/64/63409.html
    modified:   output/dclp/64/63688.html
    modified:   output/dclp/64/63707.html
    modified:   output/dclp/64/63799.html
    modified:   output/dclp/64/63976.html
    modified:   output/dclp/65/64026.html
    modified:   output/dclp/65/64216.html
    modified:   output/dclp/65/64236.html
    modified:   output/dclp/65/64255.html
    modified:   output/dclp/65/64276.html
    modified:   output/dclp/66/65076.html
    modified:   output/dclp/66/65239.html
    modified:   output/dclp/66/65340.html
    modified:   output/dclp/66/65508.html
    modified:   output/dclp/66/65528.html
    modified:   output/dclp/66/65542.html
    modified:   output/dclp/66/65632.html
    modified:   output/dclp/66/65795.html
    modified:   output/dclp/68/67136.html
    modified:   output/dclp/68/67837.html
    modified:   output/dclp/68/67839.html
    modified:   output/dclp/69/68626.html
    modified:   output/dclp/69/68627.html
    modified:   output/dclp/69/68628.html
    modified:   output/dclp/69/68629.html
    modified:   output/dclp/70/69389.html
    modified:   output/dclp/70/69508.html
    modified:   output/dclp/93/92139.html
    modified:   output/dclp/93/92141.html
    modified:   output/dclp/93/92142.html
paregorios commented 7 years ago

I am now confident that eliminating the call to normalize-unicode() when processing <unclear> will fix the problem that @rla2118 reported here, plus another in which breathings simply are lost entirely. I'm going to make the commit direct to master, with a comment referencing this ticket.

paregorios commented 7 years ago

@rla2118 and @jcowey when @m-k-r does his next pull from the epidoc-xslt master and regenerates/reindexes, you should see this fix in litpap.info.