LingSIG / wordAttributes

work space for a coherent proposal for inline attributes of <w> in TEI XML
1 stars 1 forks source link

hyperlemma-old and hyperlemma-new attributes #5

Closed daliboris closed 6 years ago

daliboris commented 7 years ago

I suggest two new attributes: @hyperlemma-old and @hyperlemma-new. In modern editions of historical Czech we use transcription for "base" text, i.e. editors use standard modern Czech spelling while preserving all the relevant features of early developmental phases of Czech (word boundaries, phonology, morphology). In our approach hyperlemma is common form of all lemmas form in diachronic corpus. @hyperlemma-old contains the oldest form (as can be assumed for the year 1300 for Old Czech), @hyperlemma-new contains the newest form (modern Czech). Example for Old Czech word kóň 'horse' (after sound changes also in form kouň and kůň): <w lemma="kóň" hyperlemma-old="kóň" hyperlemma-new="kůň">kóň</w> <w lemma="kouň" hyperlemma-old="kóň" hyperlemma-new="kůň">kouň</w>

bansp commented 7 years ago

Thanks, Dalibor. My gut feeling is that this would be too much for the Council to accept (as too specific), but it would be great to keep these examples as exactly examples of cases that are real but outside of the scope of the proposal, which should be extremely minimalistic and thus very general. In other words, we could use them to demonstrate that it is conceivable to suggest more, but that we've done the homework of reducing our request.

This would entail two kinds of potential practical solutions. Either defining a straightforward extension in its own namespace (e.g. <w lemma="kóň" czxx:hyperlemma-old="kóň" czxx:hyperlemma-new="kůň">kóň</w>

... or using feature structures (and thus either an extra layer of annotation or straightforwardly standoff annotation).

So the line of argumentation would be: there is quite a number of real, attested, necessary information containers, including e.g. hyperlemma-{old|new}. Our proposal only targets (let's say) lemma, pos and reg, as a compromise between the relative immutability of the Guidelines as a whole on the one hand, and, on the other hand. the pressure to make more options available to linguists out-of-the-box (or else the linguists will go away to other formats).

bansp commented 6 years ago

Thanks for the input, closing this issue after a merger with the TEI main branch.