TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
281 stars 88 forks source link

exemplum and language #1933

Open hcayless opened 5 years ago

hcayless commented 5 years ago

<exemplum> can have an @xml:lang, but there is an ambiguity that may make for bad practice here. Presumably @xml:lang refers to the language of the example, but it is used in the Guidelines generation process to decide which example(s) to apply to which translations—if there is a Chinese example, it appears on the Chinese translation page in preference to the English one, e.g..

And, of course, if there isn't an example in the current translation language, English wins.

But this seems rather unsophisticated. An example in a RTL writing system might be more suitable for other languages with RTL writing than English. Likewise a logographic example might work better for other logographic writing systems.

Moreover, when it comes to text direction, the language of the example has (or at least should have) special implications for the rendering of the example, but that concern seems like it should be orthogonal to the question of whether the example should show up on a given language's spec page.

tl;dr: I wonder whether we need a distinct attribute to determine the applicability of examples across different languages. Such an attribute would need to permit multiple language codes as well as an "any" or "all" option.

bansp commented 5 years ago

Thanks so much for this, Hugh. I've been uncomfortable with this state of affairs for a long time, but thought that it was an accepted design decision, and there have always been bigger things to complain about... ;-) I'm all for an extra attribute such as the one you describe. In att.linguistic, probably all of the "en" or "de" in the examples should be "all", as far as their applicability goes.

hcayless commented 5 years ago

Thanks Piotr! Nice to have confirmation I'm not imagining this.

duncdrum commented 5 years ago

this sounds very much like a band-aid when dealing with <exemplum> while the source of the problem lies with the state of i18n and TEI / Guidelines more broadly.

I find it hard to imagine how this would work in practice, if say there is an example in hieroglyphs how should one decide if this should go with logographic CJK scripts, or ancient greek and latin examples. If all exemplum end up taking the all option whats the point? Do we really want an attribute that potentially takes all the distinct lang tags present in this repo as value. Or even lang tags that have not been used yet, just in case someone provides an exemplum in that language at a later point.

see #960

hcayless commented 5 years ago

I don't agree with your characterization, @duncdrum, and I can provide examples where this already stitches us up: the first example for metamark, https://github.com/TEIC/TEI/blob/dev/P5/Source/Specs/metamark.xml#L33-L40 is in Norwegian, and was probably meant to be printed on https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-metamark.html, but it isn't, because it has been (correctly) marked as being in Norwegian.

I think there are only a handful of cases like this, but it's symptomatic of the larger problem.

And yes, our I18n is a dumpster fire, but that doesn't mean we shouldn't look for ways to facilitate improvements. There is a separate usability question around how many examples are appropriate for display on a spec page, and how those should be chosen.

duncdrum commented 5 years ago

I m still not sure how you imagine the proposed attribute to help out with the no example. If stuff isn't in the html would't the alternative be for this to happen via the stylesheets generating the html? Having a central place to define that de and en exemplars should appear together once instead on every exemplum in the xml?

What happens if a new exemplum lang is added down the road, does the submitter have to verify that each and every pervious exemplum with all actually still applies to all. What happens with languages with multiple writing directions Chinese can be RTL or LTR? What are the criteria to decide which values to put in the new attribute? Do they need to be on a per exemplum basis? Are the general enough to define them once?

I'm all for facilitating improvements, but from your description I m not sure how this is supposed to work, and I fear maintainability headaches down the road.

hcayless commented 5 years ago

It would allow you to make clear that the example was generally applicable. Currently, the Stylesheets' logic would only do that if you set @xml:lang to 'en', 'mul' (multiple), or 'und' (undetermined).

Right now, we conflate example language with example applicability, and I'm wondering whether we should separate those two concerns. I don't really understand why you think this will turn into a maintenance problem. In practice, I think you'd have two main buckets, 1) examples that are generally applicable, and 2) examples that are locked to a particular language, and then maybe a few where there's a small list of languages to which the example should apply. The upper bound on that list is going to be the number of languages we translate the spec pages into (minus one). Adding a new translation language might cause you to revisit those few, I suppose.

tuurma commented 5 years ago

@ebeshero would you be happy to take this issue

sydb commented 4 years ago

Council VF2F group agrees with @ebeshero’s thought that this needs more discussion, and thinks the I18N SIG is probably the right place for that discussion.

ebeshero commented 4 years ago

Council VF2F 2020-10-24: Council tends to want to try to output all the examples in all languages (tagdocs and Guidelines) first of all to see how this appears, if it is overwhelming or just fine. That this would be a good Stylesheets Group activity. Alternative suggestions for changing the logic of which exempla appear involve possibly introducing an <exemplumGroup> for grouping examples that are identical translations of one another, or an attribute that assists in selection of examples (either adding an attribute to indicate which take priority, or adding an attribute to indicate an example is not necessary to display).

martindholmes commented 3 years ago

@hcayless in Slack summarizes his view of the situation thus:

I wonder whether we need a distinct attribute to determine the applicability of examples across different languages. Such an attribute would need to permit multiple language codes as well as an "any" or "all" option.

I think this is exactly right. We want something parallel to the @docLang attribute on <schemaSpec>, but instead of specifying the language to use for the documentation being created, it would specify the documentation language(s) for which the exemplum is appropriate/suitable/applicable. @docLangUsage?

ebeshero commented 3 years ago

I just assigned a few more people involved with i8n to this, since it seems like we might be more ready to move forward with it now(?)...

ebeshero commented 3 years ago

VF2F proposal for showing examples in multiple languages:

<exemplumGrp>, content model { exemplum+ } with Schematron check that one and only one child <exemplum> has @targetLangs of "mul". Heuristics are that processor chooses one and only one child <exemplum> for the output, looking at @targetLangs and trying to choose best. If it can’t find a match at all, it chooses the "mul".

hcayless commented 2 years ago

Note has some examples of translated exempla. See https://github.com/TEIC/TEI/blob/077922174659344813a2ed4b8aa0eb8ab07e5636/P5/Source/Specs/note.xml#L37 and https://github.com/TEIC/TEI/blob/077922174659344813a2ed4b8aa0eb8ab07e5636/P5/Source/Specs/note.xml#L62 as well as https://github.com/TEIC/TEI/blob/c358e91d584217169aecbad401070603a9dcc5c5/P5/Source/Specs/note.xml#L103, https://github.com/TEIC/TEI/blob/c358e91d584217169aecbad401070603a9dcc5c5/P5/Source/Specs/note.xml#L116, and https://github.com/TEIC/TEI/blob/077922174659344813a2ed4b8aa0eb8ab07e5636/P5/Source/Specs/note.xml#L129

sydb commented 2 years ago

Cf. the following from att.canonical — it has "fr" inside "en":

 <exemplum xml:lang="en">
        <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:lang="fr" source="#NONE">
          <author>
            <name key="Hugo, Victor (1802-1885)" ref="http://www.idref.fr/026927608">Victor Hugo</name>
          </author>
        </egXML>
  </exemplum>