TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

example in note to specs of `measure/@type` are not precisely fitting the spec? #2529

Open GVogeler opened 4 months ago

GVogeler commented 4 months ago

In https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.measurement.html the note uses commodity="ice cream" as an example. The specs define the data type of commodity to be "teidata.word separated by whitespace". In a consistent interpretation of the example would claim that there are two commodities measured in the example ("ice" and "cream"), while text suggests that "ice cream" is considered not a list but a compound noun "ice-cream".

sydb commented 4 months ago

I believe the problem lies not with the example, but with the re-invention of teidata.word from being nothing more than a useful syntactic structure to avoid crazy characters, to essentially an atomic token of information. Thus the originalist interpretation of teidata.word+ (i.e., “1–∞ occurrences of teidata.word separated by whitespace”) is that the value of @commodity is a single string that can contain only characters that have the Unicode general property of Letter, Number, Punctuation, or Symbol, or spaces. I believe the intent was always that the value of @commodity be singular, whether there is a space in the commodity name or not. (Although the Guidelines never say this, the word “commodity” is always used in the singular.)

But the modern definition might well be that the value of @commodity is a set of tokens, each of which must be composed of characters that have the Unicode general property of Letter, Mark, Number, Punctuation, or Symbol, and each of which carries its own semantics. I personally object to this interpretation, but I may be an opposition party of one.