TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
278 stars 88 forks source link

Ruby: Guidelines perhaps need to take a position on use for non-East Asian languages #2109

Closed martindholmes closed 3 years ago

martindholmes commented 3 years ago

During the discussions on ruby, the question of whether examples like this:

https://dutchanglosaxonist.files.wordpress.com/2017/10/dotglosses1.jpg?w=730

could/should be encoded using ruby elements. We've so far avoided taking a strong position on this, but it does deserve attention.

knagasaki commented 3 years ago

In my understanding, if the encoder want to treat the example as two parallel texts, it should be encoded as two texts by <s> or some kind of elements for a chunk of text with any kind of alignment function such as @xml:id/@target. Because if it will be encoded with ruby, it will be a little difficult to treat both as two continuous texts. However, an encoder wants to strongly focus on relationship between a word and its translated word, it should be encoded by <ruby>.

martindholmes commented 3 years ago

@knagasaki I agree. Parallel texts are not the same thing as ruby, although perhaps they overlap to some degree.

HelenaSabel commented 3 years ago

@knagasaki makes a great point: in a case in which every word is glossed, there are alignment methods to tackle this that seem more efficient than ruby.

A Western example that came to my mind when reading the ruby section was this one: facsimile. In here, the main text is in Latin, and certain Latin words were considered obscure thus an interlinear annotation was provided. These annotations are either in Latin (offering a better known/simplified synonym), in Romance (~ proto-Arogonese?) or in Basque (if anyone wants more details, see the Wikipedia article).

  1. As @lujessica pointed out, the first thing to consider is the appropriateness of taking an element out of its cultural context out of convenience. Would any Western use of ruby be really appropriate?

  2. Provided that the use for non-East Asian texts is accepted by the East Asian community, I wonder what type of uses could “rightfully” be encoded as ruby. The English Wikipedia article about ruby annotations insists on the phonetic aspect of the glosses. The introduction of the ruby section in the Guidelines also focuses on their use as “pronunciation guidance”. Does this mean that a case like the one I mention above would not be suitable as ruby because the annotations are semantic equivalents?

ebeshero commented 3 years ago

@HelenaSabel presents a good test case for the question of whether ruby glosses could be applied outside of East Asian contexts. It is worth noting that the Wikipedia article on ruby glosses does not restrict their use to pronunciation, though that certainly appears to be their primary use. The article points out that ruby can be used for semantic reasons to disambiguate the meaning of a word when it can be read in multiple ways. If I understand correctly, the function of ruby seems to be to assist readers who have only partial familiarity with a text’s language, and semantic disambiguation of a particular word (such as a slang usage) would be a noted function of a ruby gloss.

@lujessica raised the issue that adapting ruby to another context outside of East Asian texts constitutes cultural appropriation, even when well intentioned, and what we seek to avoid is an abusive, careless appropriation that is not seeking to understand the original application. As soon as we introduce new possible contexts for ruby we are appropriating it for uses outside an original scope. However, we also noted in our conversation that the word “ruby” is itself derived from a Latin origin. It seems important in the Guidelines to recognize, as @knagasaki observes, that ruby texts are distinct from parallel text annotations. Thinking about how we present ruby in the multicultural context of the Guidelines, is a ruby annotation a short gloss targeting pronunciation or disambiguation of a word or character for readers to whom the language of the source document may be unfamiliar?

ebeshero commented 3 years ago

Here is the passage from the Wikipedia article commenting on the Japanese use of ruby for semantic help:

“ Also, ruby may be used to show the meaning, rather than pronunciation, of a possibly-unfamiliar (usually foreign) or slang word. This is generally used with spoken dialogue and applies only to Japanese publications. “

duncdrum commented 3 years ago

This question cannot be answered without deciding for a wide or narrow definition of ruby.

Cultural appropriation cuts both ways here. In the narrow definition All East Asian documents are interpreted using Japanese textual practices outside their historic or linguistic context. See our discussion about interlineares glosses, despite documentary evidence of both commentarial and rubyesque annotations occurring in the same document on the same line.

@ebeshero What is the original scope of ruby? A transcultural textual practices of interlinear glosses that took on a live of its own into the present day. Or a cultural invention largely restricted to Japanese documents whose spread is limited to imperial expansion (bopomofo) but inappropriate e.g. for pinyin or Hanzi annotation of Manchu scripts?

In a wide definition, the question arises if e.g. Anglo Saxon documents using ruby are unduly adopting foreign concept.

Getting ruby to the same level of efficiency, that @HelenaSabel observes for Western docs, is a concern I share.

As for the Wikipedia article, it leans heavily on contemporary practices, whereas TEI needs to take historic practices into account. Phonation remains a spurious concept when applied to a non phonetic script such as ideographs. See my Manchu example. How do we determine if an annotation was intended to be (partially) phonetic? Can only alphabetized annotations be ruby? How then did the mongols adopt Chinese characters to transliterate terms of their language, during the yuan dynasty?

lb42 commented 3 years ago

... and when you've decided that, how about Hebrew cantillation marks? can i use tei:ruby for them?

knagasaki commented 3 years ago

This question cannot be answered without deciding for a wide or narrow definition of ruby.

Cultural appropriation cuts both ways here. In the narrow definition All East Asian documents are interpreted using Japanese textual practices outside their historic or linguistic context. See our discussion about interlineares glosses, despite documentary evidence of both commentarial and rubyesque annotations occurring in the same document on the same line.

@ebeshero What is the original scope of ruby? A transcultural textual practices of interlinear glosses that took on a live of its own into the present day. Or a cultural invention largely restricted to Japanese documents whose spread is limited to imperial expansion (bopomofo) but inappropriate e.g. for pinyin or Hanzi annotation of Manchu scripts?

In a wide definition, the question arises if e.g. Anglo Saxon documents using ruby are unduly adopting foreign concept.

Getting ruby to the same level of efficiency, that @HelenaSabel observes for Western docs, is a concern I share.

As for the Wikipedia article, it leans heavily on contemporary practices, whereas TEI needs to take historic practices into account. Phonation remains a spurious concept when applied to a non phonetic script such as ideographs. See my Manchu example. How do we determine if an annotation was intended to be (partially) phonetic? Can only alphabetized annotations be ruby? How then did the mongols adopt Chinese characters to transliterate terms of their language, during the yuan dynasty?

As <ruby> is a style of annotation, TEI should distinguish it from other types of annotation. <ruby> is a part of interlinear gloss, not same as it. As interlinear gloss can be encoded by <note>. TEI doesn't need to use <ruby> for all interlinear gloss even in East Asian documents. In my understanding, ruby is used for how to read a word or a phrase, not for every interlinear gloss in general.

As I cannot read Manchu, I can't decide whether it would be appropriate or not. But I don't think it was called as ruby historically. If the Manchu example indicates just a translation of a word, it will be better to be encoded by <note>, <w> or something like that with linking attributes instead of <ruby>.

And then,

How do we determine if an annotation was intended to be (partially) phonetic?

If encoder can't determine the characteristics of the annotation, encoder may use another element like <note> with an appropriate @type and @place to lay out it as interlinear gloss. Otherwise, it might be an alternative to use @type to distinguish types of relationship between <rb> and <rt>. But it might bring in confusion for encoders.

knagasaki commented 3 years ago

Hebrew cantillation marks

How Hebrew cantillation marks are encoded in TEI so far?

martindholmes commented 3 years ago

I think the cultural appropriation issue might be a bit of a red herring;

the name "ruby" in fact originated from the name of the 5.5pt font size in British printing, which is about half the 10pt font size commonly used for normal text. (https://www.w3.org/TR/2001/REC-ruby-20010531/Overview.html.utf-8#what)

In British typography, ruby was originally the name for type with a height of 5.5 points, which printers used for interlinear annotations in printed documents. In Japanese, rather than referring to a font size, the word became the name for typeset furigana. When transliterated back into English, some texts rendered the word as rubi, (a typical romanisation of the Japanese word ルビ, instead of ルビー (rubī), the expected transliteration of ruby). However, the spelling "ruby" has become more common since the W3C published a recommendation for ruby markup. In the US, the font size had been called "agate", a term in use since 1831 according to the Oxford English Dictionary. (https://en.wikipedia.org/wiki/Ruby_character#History)

So although it's surely been a feature of East Asian texts for much longer than it's been in Western texts, I think it belongs to all of us; it's just that it has a special significance for East Asian texts, and those traditions needs more sophisticated encoding options to capture the nuances of its usage and its co-occurrence with other textual phenomena.

duncdrum commented 3 years ago

@knagasaki in each of these cases, we shouldn’t ignore the fact that rendering vertically layed out East Asian documents in html or pdf, will use ruby in another namespace to achieve the desired effect.

If the TEI source of this is ruby in only some cases, but not in others, the guidelines need to be very explicit about when that should be the case. I m afraid we can’t avoid being explicit about what we mean by ruby, or East Asian documents for that matter: Does it matter if hanzi appear as base characters, anywhere, …

i don’t question the ability of encoders to declare an intention, but why they should be asked to declare on phonation, when that is not a primary feature of the writing system in use. We don’t ask encoders to make decisions, based on the ideographic properties of the letter a. Because it’s meaningless, both to historical agents, and the encoder.

knagasaki commented 3 years ago

@knagasaki in each of these cases, we shouldn’t ignore the fact that rendering vertically layed out East Asian documents in html or pdf, will use ruby in another namespace to achieve the desired effect.

Thank you for giving you opinion. Due to our talk, the implication of ruby and East Asian matters gradually become apparent on this repo. But I'm very sorry for my poor English.

I don't think the function of layout of ruby in html will be used only in East Asian vertical documents, because it can be used as just a layout function. It will be used also in horizontal documents in various languages if a designer want to use it. In other words, why do you mention only(?) the possibility for usage of vertically layed out East Asian documents?

<ruby> in HTML might not be convenient for the Manchu example because the font size of the interlinear glosses are not smaller than the main texts and it seems to be difficult to align the position of the glosses with the words of base texts. If <ruby> will be used for the example, the style of <rt> will have to be customized. However, if the style must be customized, it might be easier to use only CSS than <ruby>. Here is an example of layout of interlinear gloss without <ruby>: https://candra.dhii.jp/nagasaki/manyo/manyoviewer2021.html (TEI Source) (Sorry, it requires Japanese fonts.) Most of the glosses are marked up by <note> (partially, <add>, <corr> and so on) because they are not ruby, but other types of vertical interlinear gloss. They are rendered only by CETEICean and CSS, not by <ruby> in HTML. Using CSS (with <span>) for rendering glosses (and some other marks) is a recent trend in encoding complicated East Asian vertical document in HTML. <ruby> in HTML can easily be used for modern simple text, but it is often difficult for pre-modern woodcut printing or handwritten complicated text.

Anyway, while I don't know the recent policy of the Council, some elements in the Guidelines are apparently different from same name elements in other namespaces. So, I think it is acceptable (and I must teach so far) that usage of some tags differs from other namespaces.

I want to reply the other paragraphs, but it is time to sleep, sorry...

martindholmes commented 3 years ago

We should remember that in TEI encoding of primary source texts we are always, and only, describing what our source document looks like. We are not providing information or hints for later rendering processes. That's what the <model> element is for (https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-model.html). These ruby elements are designed to provide a convenient and easily-processable way to describe the positioning etc. of specific types of annotations with relation to the base text. Whether they are rendered into similarly-named elements in the XHTML namespace is not relevant; that's an issue for the project team. At the moment, the TEI Stylesheets provide no rendering at all for ruby elements; they may one day, but that depends on someone raising tickets on the Stylesheets repo and other people responding to those tickets.

HTML ruby elements are not the same as TEI ruby elements, just as HTML <p> is a different thing from TEI <p>. @duncdrum is right that we need to define what we mean by ruby in TEI, but we do not have to be aggressively prescriptive about it. We don't really attempt to define what <p> means:

https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html

It would be very difficult to do, and definitions might vary a lot between traditions. (There is more information in the prose, but it basically amounts to "you'll know it when you see it".) Any project can take a strong position on what, for their purposes, counts as ruby and what should be encoded differently; they can explain their definitions in the header, and even override the gloss and desc elements in the elementSpecs to provide more detailed definitions if they wish to. But I think the Guidelines themselves should avoid taking strong positions and drawing had lines between different types of annotation, especially in the early stages of implementing support for ruby.

MegJBrown commented 3 years ago

In discussion with @hcayless @martinascholger and @martindholmes, added sentence noting that although ruby is included in the TEI for use with East Asian languages, users may find the encoding useful in a variety of contexts.