DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Missing elements: <item> and <list> #167

Closed anacastrosalgado closed 1 year ago

anacastrosalgado commented 1 year ago

Concerning the encoding of the introductory pages of MORAIS, I noticed that the TEI Lex-0 specification does not include the element <item>. As I’m encoding the list of abbreviations, this element is essential.

Examples:

<list rend="simple" xml:lang="pt">
   <item>
      <abbr type="POS" norm="adjective">adj.</abbr>
      <expan>Adjectivo</expan>
      <p>.</p>
      <!-- ... -->
      <item>
         <abbr type="domain">Rhet.</abbr>
         <expan>Rhetorico</expan>
         <p>.</p>
      </item>
      <item>
         <abbr type="POS" norm="noun">S.</abbr>
         <expan>Subſtantivo</expan>
         <p>.</p>
      </item>
      <item>
         <abbr type="number">Sing.</abbr>
         <expan>Singular</expan>
         <p>.</p>
      </item>
      <!-- ... -->
</list>

Can you bring back this TEI element? The same for <list>.

daliboris commented 1 year ago

In Persian-Czech Dictionary we use <taxonomy> element for this purpose.

<taxonomy xml:id="LeDIIR.taxonomy.grammar">
    <bibl>Parts-of-speech</bibl>
    <category xml:id="LeDIIR.taxonomy.adv"
             n="46e4fe08-ffa0-4c8b-bf98-2c56f38904d9">
      <catDesc xml:lang="en">
         <idno>adv</idno>
         <term>Adverb</term>
         <gloss>An adverb, narrowly defined, is a part of speech whose members modify verbs for such categories as time, manner, place, or direction. An adverb, broadly defined, is a part of speech whose members modify any constituent class of words other than nouns, such as verbs, adjectives, adverbs, phrases, clauses, or sentences. Under this definition, the possible type of modification depends on the class of the constituent being modified.</gloss>
      </catDesc>
      <catDesc xml:lang="cs-CZ">
         <idno>adv</idno>
         <term>adverbium</term>
         <gloss>Adverbium (příslovce). Jelikož i zde slovní druh charakterizuje významovou funkci českého ekvivalentu, řadíme do této kategorie i adverbializované skupiny tvořené pomocí předložek, příslovečné spřežky, např. [be-sádegí] (snadno), [baráje abad] (navěky), [az (rú-je) ettefágh] (náhodou), [tá hálá] (doposud). Jinak jsou typickými slovotvornými příponami adverbií např. z arabštiny přejatá přípona [-an], resp. přípona [-áne], která vzniklé odvozenině dává současně význam adverbia pro neživotné subjekty.  O tvorbě adverbií v perštině viz t. <ref target="https://www.jahanshiri.ir/fa/en/adverb-formation"
                 type="external"/>
               </gloss>
             </catDesc>
             <catDesc xml:lang="fa">
             <idno>قید</idno>
           <term>قید</term>
         <gloss/>
      </catDesc>
    </category>
</taxonomy>
anacastrosalgado commented 1 year ago

Hello, @daliboris ! Thanks for sharing. I will analyse it and then get back to you again. It is the first time I'm encoding the front matter of a dictionary. I'm delighted.

anacastrosalgado commented 1 year ago

@daliboris , thanks. I have a classification proposal for the list itself. I was using type for this. The type attribute was used to distinguish types of abbreviations by their function, and the norm attribute was used in POS to follow Universal POS tags and expand attributes to supply the expansion. In the printed edition, we don't have any divisions, and the proposal is to have this classification: POS; usage (domain; time; geographic; sociocultural; textType; frequency); gender; number; grammar (verbs subcategories; degrees of adjectives; hint (chapter indications; other text). Are the abbreviations classified in the printed edition of the Persian-Czech Dictionary? Now I'm curious... I see "grammar" in your's taxonomy.

>>> @ttasovac @laurentromary

ttasovac commented 1 year ago

I love taxonomies so what @daliboris is suggesting has a special place in my heart 😄 But taxonomies go into <classDecl> in the header.

What Ana is asking here, however, is for us to consider allowing lists and items for the encoding of, well, lists and items in the dictionary front matter. So it's not about the header, but rather about the content of the dictionary before entries proper. And there, we often find lists of abbreviations used.

So I think this is a good suggestion. And I would vote for it. (And implement it, once I figure out what the hell is wrong with my oXygen workflow...)

daliboris commented 1 year ago

Hi @anacastrosalgado, I decided to use the @xml:id attribute for different types of taxonomy/abbreviations. And each taxonomy (group of abbreviations) has its own label. In Persian-Czech Dictionary we use these ids:

Persian-Czech Dictionary is born-digital and will be available only via web and mobile application, so I'm not sure what do you mean by printed edition. Do you mean printed version of generated web page?

Btw. I plane to generate list of abbreviations from taxonomies directly from . For now, we use data from taxonomy in the tooltip:

Taxonomy-tooltip

daliboris commented 1 year ago

Regarding the front matter and <taxonomy>: @ttasovac is right - <taxonomy> can't be used in the <front> (sub)element(s). It was my misunderstanding..

ttasovac commented 1 year ago

I've started working on this in the dev-0.9.2 branch.

Just to recap: we want to bring back list and item to TEI Lex-0 because the dictionary front matter often contains lists (of abbreviations, domains etc.) However — and this is really important for me — I don't want lists to pop up everywhere where they do in vanilla TEI, including inside certain dictionary elements:

  1. some elements, like etym allow lists because they have model.inter in their content model; and
  2. some elements, like def allow lists because their content model includes (my mortal enemy) macro.paraContent.

With b7cb41cdcc4fbdec665938f3d77bfa5fb9d16d87, I have taken care of point 1: list and item are back in the game, but I made sure that they are not allowed in dictScrap, etym, form, gramGrp and xr.

You can test the schema in the 0.9.2 development branch by pointing to https://raw.githubusercontent.com/DARIAH-ERIC/lexicalresources/dev-0.9.2/Schemas/TEILex0/out/TEILex0.rng

This issue should stay open until I finish taking care of point 2 above.

ttasovac commented 1 year ago

I forgot to post here that 104d579b36cd8655650ab031d006758c6d3f7e69 in dev-0.9.2 fixed the issue 2 from above: def, gram, hyph, lang, lbl, orth, pron, stress, syll, usg can no longer contain list.

I'm not going to release 0.9.2 just yet, but I am closing this issue because Ana's original request to have lists and items available has been taken care of.

As usual, feel free to reopen if you have any questions related to this.