DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Representation of the ending for the inflected <form> #76

Closed daliboris closed 4 years ago

daliboris commented 4 years ago

As mentioned in 4.4. Representation of inflected forms of TEI Lex-0, inflectionally rich languages use additional forms next to the lemma, but the form is very often not full form, just a part of the form (e.g. ending in the case of Czech or Slovene).

What will be better way to encode this information: using <form type="inflected ending">, or <orth><m>-a</m></orth>?

At this time <m> is not allowed within <orth> element (only <c> and <pc> elements).

A larger example:

Printed form: abeceda, -y (alphabet)

<form type="lemma" xml:id="en000008.hw1" style="font-weight: bold;">
  <orth>abeceda</orth>
</form>
<pc style="font-weight: bold;">,</pc>
<form type="inflected">
  <gramGrp>
    <case value="genitiv" />
    <number value="singular" />
  </gramGrp>
  <orth>
    <m type="ending">-y</m>
  </orth>
</form>

or

<form type="lemma" xml:id="en000008.hw1" style="font-weight: bold;">
  <orth>abeceda</orth>
</form>
<pc style="font-weight: bold;">,</pc>
<form type="inflected ending">
  <gramGrp>
    <case value="genitiv" />
    <number value="singular" />
  </gramGrp>
  <orth>-y</orth>
</form>
laurentromary commented 4 years ago

There is an attribute on <orth> to this purpose: @extent (see https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.partials.html), which should prevent encapsulating the ending in an <m>. Still, it may be appropriate to do as your suggest, in which case we would have to reintroduce the element in the specification.

xlhrld commented 4 years ago

It's a question of perspective: does the inflectional ending just denote the ending as such (i. e. is it a way to denote an inflectional class) or is it rather an abbreviation for a word form of the lemma with certain grammatical features (here: genitive singular)? I'd tend to the latter in which case I'd also supply orth/@expand to provide the full lemma variant:


<form type="lemma" xml:id="en000008.hw1" style="font-weight: bold;">
  <orth>abeceda</orth><pc>,</pc>
</form>
<form type="inflected">
  <gramGrp>
    <case value="genitiv" />
    <number value="singular" />
  </gramGrp>
  <orth extent="suff" expand="abecedy">-y</orth>
</form>
daliboris commented 4 years ago

In our (i.e. Czech diachronic) lexicographical tradition, this kind of information (endings of genitive singular in the case of nouns, and 1st and 3rd person of present indicative in the case of verbs) has both functions: it is and abbreviation for a word form as such and in the combination with lemma (in most cases with the ending of the lemma) it denotes inflectional class.

Dictionaries doesn't define inflectional classes, user must be aware of historical grammar of Czech.

After your suggestions and explanations I think using orth with @extent and @expand is the suitable solution.

Thank you.