TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

more on dictionary: The element <usg> inside <def> #1800

Open chr-emil opened 6 years ago

chr-emil commented 6 years ago

I am currently working on defining a TEI format for three modern Norwegian dictionaries (two at www.ordbok.uib.no). The dictionaries are edited in a relational database system, are published both on the web and as printed books.

For each definition text and also for each usage example (mostly created by the editors as is usual for this kind of dictionaries) the editor may add information about the area of usage. In the given system this information is taken from a predefined list (zool., bot., mil., outdated,…). The element <sense> is in TEI used to encode the definition (meaning) structure, mostly a tree-structure. In each <sense> one may have a (list of) textual definitions experessed in <def> (e.g ‘;’ separated) followed by a (list of) examples of use in <cit>. For each of these textual definitions and examples one can add a usage marker. Intuitively these markers should be encoded by the use of <usg>. However, <usg> cannot occur inside a <def> element.

In the Guidelines we find: ‘usg’ can only occur inside:: dictScrap entry entryFree etym form gramGrp hom re sense xr. In my case I would need to encapsulate each <def> and <cit> in a separate <sense> which is artificial and logically wrong. Also, the element can contain almost anything even <email>, <height>, and <climate>.

The element has an area of application outside dictionaries. As <def> it may contain a rich variety of elements including <superEntry>!

The dictionaries I work with are real existing dictionaries. Since TEI is not prescriptive, it should be adjusted to cover these dictionaries.

Suggestion: Extend the formal definition of <def> and <cit> by adding <usg> as a possible sub elements.

iljackb commented 6 years ago

Hi,

should generally be as a child of not , however of course legacy dictionaries are not always so conveniently organized which sounds like might be your case. Could you give an example of what you want to be able to do and show exactly why it is you want to put in ? Best, Jack On Wed, Aug 1, 2018 at 9:13 AM, chr-emil wrote: > I am currently working on defining a TEI format for three modern Norwegian > dictionaries (two at www.ordbok.uib.no). The dictionaries are edited in a > relational database system, are published both on the web and as printed > books. > > For each definition text and also for each usage example (mostly created > by the editors as is usual for this kind of dictionaries) the editor may > add information about the area of usage. In the given system this > information is taken from a predefined list (zool., bot., mil., > outdated,…). The element is in TEI used to encode the definition (meaning) > structure, mostly a tree-structure. In each one may have a (list of) > textual definitions experessed in (e.g ‘;’ separated) followed by a (list > of) examples of use in . For each of these textual definitions and examples > one can add a usage marker. Intuitively these markers should be encoded by > the use of . However, cannot occur inside a element. > > In the Guidelines we find: ‘usg’ can only occur inside:: dictScrap entry > entryFree etym form gramGrp hom re sense xr. In my case I would need to > encapsulate each and in a separate which is artificial and logically wrong. > Also, the element can contain almost anything even , , and . > > The element has an area of application outside dictionaries. As it may > contain a rich variety of elements including ‘’! > > The dictionaries I work with are real existing dictionaries. Since TEI is > not prescriptive, it should be adjusted to cover these dictionaries. > > Suggestion: Extend the formal definition of and by adding as a possible > sub elements. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > , or mute the thread > > . >
chr-emil commented 6 years ago

@iljackb comments "the <usg> should generally be encoded as a child of <sense> not <def>" is a normative lexicographic statement. The dictionaries in question are not legacy dictionaries. I agree that may be better to have one <usg> for a list of semicolon separated definitions. However, if a lexicographer decide to open for a usage marker for each of a list of defintions (or examples), TEI is not in the position to say "this is not allowed, reorganize your dictionary!"

scstanley7 commented 5 years ago

Just to revisit this, does the example provided (pairing <usg> and <def> within <sense>) resolve the issue? It seems not from your initial comment, but I'm struggling to understand how "encapsulat[ing] each <def> and <cit> in a separate <sense>" would be "artificial and logically wrong" without an example. I think a real-world example would help me understand this problem much better.

chr-emil commented 5 years ago

Hi I have to refresh my memory and have a look into the isse. Best, Chr-E


From: Sarah Stanley [notifications@github.com] Sent: 07 May 2019 17:24 To: TEIC/TEI Cc: Christian-Emil Smith Ore; Author Subject: Re: [TEIC/TEI] more on dictionary: The element inside (#1800)

Just to revisit this, does the example provided (pairing and within ) resolve the issue? It seems not from your initial comment, but I'm struggling to understand how "encapsulat[ing] each and in a separate " would be "artificial and logically wrong" without an example. I think a real-world example would help me understand this problem much better.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/TEIC/TEI/issues/1800#issuecomment-490127400, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHCV2JWLVAUJUMPTFP2URADPUGNMLANCNFSM4FNH3QJQ.

raffazizzi commented 4 years ago

@chr-emil have you had a chance to look at this again? Thanks!

PFSchaffner commented 4 years ago

In our TEI-derived (i.e. modified TEI) schema for our admittedly very legacy Middle English Dictionary, usg is certainly allowed within def, and is widely used; it is hard to think of an alternative, since the usage labels are embedded within running prose, and usually (or at least often) do not stand separably off from them.
<def>A rung of a ladder; also <usg type="semantic" expan="figurative">fig.</usg></def>

<def>A scraping tool used in carpentry; <usg type="field" expan="medicine">med.</usg> an instrument for scraping bone.</def>

<def><usg type="field" expan="chess">Chess</usg> A rook, castle; also, a representation of a rook in a coat of arms.</def>

<def n="a">A rooftop; a housetop;</def> <def n="b">a roof as the highest part of a building or as a high or an exposed place; also <usg type="semantic" expan="figurative">fig.</usg>, in phrase: <hi rend="b">rote and ~</hi>.</def> `

ebeshero commented 4 years ago

Council agrees this is good to implement with @PFSchaffner 's examples.

raffazizzi commented 4 years ago

I can see two ways of implementing this (allowing <usg> within <def>) and would like the council's opinion before moving ahead.

  1. Brute force: <usg> within <def> allowed directly

    <alternate minOccurs="0" maxOccurs="unbounded">
      <macroRef key="macro.paraContent"/>
      <elementRef key="usg"/>
    </alternate>
  2. Add model.lexicalRefinement, the class with <usg> to the content of <def>. Which strikes me as more elegant, but would also allow colloc gramGrp lbl pos subc.

    <alternate minOccurs="0" maxOccurs="unbounded">
      <macroRef key="macro.paraContent"/>
      <classRef key="model.lexicalRefinement"/>
    </alternate>
martinascholger commented 3 years ago

Council meeting 2021-10-14: green for @raffazizzi to go with proposal 1 (brute force).

sydb commented 2 years ago

Nope. Brute force (1) approach simply will not work for DTDs. Remember that macro.paraContent boils down to ( #PCDATA | g | s |cl | phr | w | m … )*. Thus approach (1) produces <!ELEMENT def ( %macro.paraContent; | usg )* >, which fails because, roughly speaking, the #PCDATA can only exist as the 1st item in the content model (here a paren is 1st). See the spec if you care for the formal details. In any case, I am not sure how should be the right way to do this sort of thing. One possibility (which I have not decided I like much) is to make the Stylesheets smart enough to notice this situation and “flatten out” the corresponding DTD content model. Another is to make a new macro, “paraContentPlusUsg”, I suppose.

sydb commented 2 years ago

Talking this over with @martindholmes we are going to reverse this change for now, and re-visit how to accomplish <usg> in <def> if & when @chr-emil (or someone else) provides a use case example.

ttasovac commented 6 months ago

Hi guys. This question keeps popping up on TEI Lex-0. You've asked for some examples, and here are two from @anacastrosalgado's Portuguese dictionaries (taken from https://github.com/DARIAH-ERIC/lexicalresources/issues/152):

image
   <sense xml:id="MOR1.DLP.ASTROLABIO.s.1">
      <usg type="domain" corresp="#domain.astronomy" resp="#Salgado"/>
      <def>inſtrumento Aſtronomico,
         de que ſe uſa para ſe tomarem a altura dos
         aſtros</def>
      <pc>.</pc>
   </sense>
image
   <sense xml:id="MOR1.DLP.TELESCOPIO.s.1">
      <usg type="domain" corresp="#domain.astrology" resp="#Salgado"/>
      <def>inftrumento óptico de
         Aftronomia que ferve de obfervar na terra , ou
         no Ceo os objectos remotos, por meio da reflexão
         , ou refracção da luz</def>
      <pc>.</pc>
   </sense>

Things to consider:

I agree that this is a tricky situation, but as @iljackb says above legacy dictionaries are often not "conveniently" organized. We have many more examples in which definitions, indeed, include collocation information or grammatical information etc. So the idea of "pure" definitions simply doesn't work in older dictionaries.