TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
276 stars 88 forks source link

language problems #2447

Closed sydb closed 10 months ago

sydb commented 1 year ago
  1. The <desc> for the langUsage/language/@ident attribute says “and which is referenced by the global @xml:lang attribute”. That sort of implied it must be referenced by the @xml:lang attribute. But surely one can have a document that uses language/@ident but does not use @xml:lang.
  2. The <desc> for the langUsage/language/@usage attribute says “specifies the approximate percentage (by volume) of the text which uses this language.”. What does “volume” mean in this context?
  3. The example in the tagdoc for <language> includes <language ident="i-az-Arab" …>Azerbaijani in Arabic script</language>, but I do not think i-az is a valid language tag per RFC 5646 (part of BCP 47), and it is not listed in the registry.
bansp commented 1 year ago

I'll slap the LinSIG label here just to make sure that it doesn't escape the group. I'm especially intrigued by the last point.

lb42 commented 1 year ago

A quick glancecat wikipedia suggests that i-az is a typo for az or maybe aze

sydb commented 1 year ago

Well, whether a typo or not the correct tag is almost certainly “az-Arab”. But one has to wonder if the point of the example was to demonstrate one of the (26) grandfathered tags, of which 13 start with “i-”, but none of which are for Azerbaijani.

trishaoconnor commented 10 months ago

Updated descriptions for the attributes (changes indicated by italics in the first bullet point) and corrected language code in the example.

  1. Updated the description for the langUsage/language/@ident attribute to "which is used to identify the language documented by this element, and which may be referenced by the global <att>xml:lang</att> attribute."
  2. Updated the description for the langUsage/language/@usage attribute to "specifies the approximate percentage of the text which uses this language."
  3. Updated i-az-Arab to az-Arab

See PR #2502.