TEIC / Stylesheets

TEI XSL Stylesheets
234 stars 125 forks source link

Japanese text is garbled in figure captions in PDF output #151

Open martindholmes opened 8 years ago

martindholmes commented 8 years ago

The current build of the Guidelines PDF, p.203, Figure 3:

http://jenkins-paderborn.tei-c.org/job/TEIP5/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/Guidelines.pdf

shows a clear example of this. Compare the same part of the HTML, where it's described as Figure 5.2 (subject of a different ticket):

http://jenkins.tei-c.org/job/TEIP5/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/html/WD.html#WDWMEG2

peterstadler commented 8 years ago

@smjwsk already has a patch (locally) in place, I believe.

martindholmes commented 4 years ago

if @smjwsk did have a patch, we never got it. This is still a problem, four years on.

peterstadler commented 4 years ago

My dirty hack for that caption would be to insert a <seg xml:lang='ja'> into the <title> at https://github.com/TEIC/TEI/blob/513c0b14a94a157ddede60dea902d8cd00208224/P5/Source/Guidelines/en/WD-NonStandardCharacters.xml#L992 As noted at https://github.com/TEIC/Stylesheets/issues/202#issuecomment-610989143 the processing of @xml:lang is very limited and only works for some elements.

martindholmes commented 4 years ago

As of now, it's p.211 Figure 3, which has this caption:

5.7 Examples of Different Writing ModesFigure 3:Detail from p.62 of￿￿￿￿￿￿￿”. ￿￿￿. 1985. ￿￿￿￿￿￿￿ II. ￿￿￿￿￿￿ 11

martindholmes commented 4 years ago

Note that this is not fixed by https://github.com/TEIC/TEI/commit/66144683a4a210a8ea6a5e23dffc74ad80d142b6 which switches to Noto fonts; I believe that the @xml:lang attribute is not being acted on here:

<figure>
<graphic width="500px" height="624px" url="Images/ja_vertical_indonesian_frag.jpg"/>
<head>Detail from p.62 of <title xml:lang="ja">インドネシア語". 崎山理. 1985. 外国語との対照  II. 講座日本語学 11.</title></head>
 </figure>

although I haven't confirmed that yet.

peterstadler commented 4 years ago

Exactly @martindholmes, as I wrote in https://github.com/TEIC/Stylesheets/issues/202#issuecomment-610981840

It turned out it's not a font issue but a processing issue – which is way bigger, sigh. There is actually almost no support for @xml:lang except for occurrences on <bibl> and <seg>.

Hence the proposed hack to to insert a <seg xml:lang='ja'> into the <title>.

martindholmes commented 4 years ago

@peterstadler Hmm. Should we hide the problem in this case and pretend it's not there, or try and fix it?

duncdrum commented 4 years ago

I ve seen FOP do this when either a) only essential Linux font support libraries exist on a VM , or b) the JVM running fop was not properly set to Unicode encoding for all its locale and encoding settings. Some other corner cases, but either or both of these would be my best guess.

martindholmes commented 4 years ago

Japanese fonts appear to be working fine in the latest build with Noto; I think this is just a case where @xml:lang is being ignored in processing, as @peterstadler suggests.

peterstadler commented 4 years ago

@pstadler Hmm. Should we hide the problem in this case and pretend it's not there, or try and fix it?

Just for the record: My Github handle is @peterstadler . I got confused myself about this "Doppelgänger"

pstadler commented 4 years ago

Yes, I got confused as well. Hey Peter. All good?

martindholmes commented 4 years ago

Sorry @pstadler @peterstadler. @peterstalder is @pstadler in Slack, no?