TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

definition of `<mentioned>` too restrictive #2398

Open sydb opened 1 year ago

sydb commented 1 year ago

Ash Clark came across the following passage in An Essay to Revive the Ancient Education of Gentlewomen, a 1673 work by Bathusa Makin.

I confeſs this part of Speech is moſt difficult to be known in the Engliſh Tongue; yet it may be done thus, All words ending in ing, d, t, or n, which have no ſign at all, and may be reſolved into Verbs, are Participles, as learning, which doth learn; learned, which is learn­ ed.

It is clear (at least to me) that the “ing”, ‘d’, ‘t’, and ‘n’ are mentioned, not used. The description of <mentioned> in the tagdoc (and thus, in some sense, the definition of that element) is “marks words or phrases mentioned, not used”. Even the more complete description in the prose says <mentioned> is for “where a word or phrase is being discussed in the body of a text rather than forming part of the text directly”. But these things are neither phrases nor even words; they are most certainly sub-word thingies.

Of course a word can be mentioned, not used. (My father taught me this discrepancy ½ a century ago with something like “Boston is a city of .6 million people; ‘Boston’ is six letters long.”)

Certainly a phrase can be mentioned, not used — “The phrase ‘man’s best friend’ is often attributed to Frederick the Great of Prussia.” But so can a clause — “The saying ‘Injustice anywhere is a threat to justice everywhere’, extracted from King’s Letter from Birmingham Jail, often appears on lawn signs and bumper stickers.” (Yes, one might choose to encode either of those with <q> or <quote>, but one might also quite reasonably choose <mentioned>.)

And, as shown above, so might a suffix or prefix — “The ‘epi’ of ‘epinephrine’ is analogous to the ‘ad’ of ‘adrenaline’.”; or, for that matter, a lemma or stem — “The ‘nephr’ of ‘epinephrine’ is analogous to the ‘renal’ of ‘adrenaline’.” Or even a letter — “An ‘a’ can appear drastically different in different typefaces.”

At the moment, I am not entirely sure what (if anything) should be done about this.

bansp commented 1 year ago

While not beautiful, the following might do the job, perhaps:

It only implies other types of spans rather than enumerate them, but then, the examples you've come up with would round the spec up.

lawrenceevalyn commented 1 year ago

Is 'textual fragments' overcomplicating it? What about "marks text that is mentioned, not used" ? and "where text is being discussed in the body of a work rather than forming part of the work directly"?

ebeshero commented 1 year ago

I like @lawrenceevalyn's concise solution. As I was reading this I was thinking, "particles" or "grams," but we need not that complexity. :-)

sydb commented 1 year ago

Indeed. Both @bansp’s and @lawrenceevalyn’s solutions had me thinking “why didn’t I think of that?” :smile:

martindholmes commented 1 year ago

How about "words, phrases, or components of words"?