DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Colloc and form within def #38

Closed JessedeDoes closed 1 year ago

JessedeDoes commented 5 years ago

In some dictionaries (WNT for instance), we have forms and collocs embedded in definitions. Example:

<def>In de taal die men jegens kinderen bezigt, wordt de rechterhand vaak 
<colloc>het mooie, schoone, fraaie, goede, zoete handje</colloc> 
genoemd; de linkerhand heet dan bij tegenstelling 
<colloc>het leelijke of verkeerde handje</colloc>. ....
</def>

What would you suggest? Broadening content model of def or different encoding?

ttasovac commented 5 years ago

Funny you should ask this because earlier today we closed #24 which dealt with the content model of <def>. We can now use <xr> in def.

As for <colloc>, we also discussed this today, and I proposed that we go for <cit type='collocation'><quote>blah blah </blah></cit>. For this we don't have to further change anything about <def>.

So I'd recommend you give up <colloc> and do <cit type='colloc'> instead. The advantage of cit over colloc is twofold: 1) consistency with examples, translations etc., and 2) within cit, we can have defs, other cits etc... so if a dictionary translates or defines collocations (that was the example that started our discussions today), we can deal with it. <colloc> is too restrictive in this sense.

xlhrld commented 5 years ago

@JessedeDoes I don't quite see how this is a definition in the first place. It doesn't tell a thing about the meaning of »hand«. It does give collocators for »rechterhand« and »linkerhand« respectively and it describes the extra-linguistic constraint »in speech directed to children« – which is genuinely usg, right?

xlhrld commented 5 years ago

So yes, I'd opt for a different encoding here which of course is not so easy because of the discursive nature of the description. Something like:

<sense>
  <usg type="socioCultural">In de taal die men jegens kinderen bezigt</usg><pc>,</pc>
  wordt de rechterhand vaak
  <cit type="collocation">
    <quote>het mooie, schoone, fraaie, goede, zoete handje</quote>
  </cit>
  genoemd;
  de linkerhand heet dan bij tegenstelling
  <cit type="collocation">
    <quote>het leelijke of verkeerde handje</quote>
  </cit><pc>.</pc>
</sense>

Currently, I have no immediate idea as to the stretches of text directly inside sense. For the etymology, we'd pull those into a cit and use seg to capture the »discursive glue«. So maybe an overarching cit of some yet unclear @type could be used to comprise the free floating text and the two proper cit/@type="collocation"?

<sense>
  <usg type="socioCultural">In de taal die men jegens kinderen bezigt</usg><pc>,</pc>
  <cit type="???">
    <seg>wordt de rechterhand vaak<seg>
    <cit type="collocation">
      <quote>het mooie, schoone, fraaie, goede, zoete handje</quote>
    </cit>
    <seg>genoemd<seg><pc>;</pc>
    <seg>de linkerhand heet dan bij tegenstelling<seg>
    <cit type="collocation">
      <quote>het leelijke of verkeerde handje</quote>
    </cit><pc>.</pc>
  <cit>
</sense>

Or maybe I'm led a bit astray here?

JessedeDoes commented 5 years ago

Maybe the example was not especially felicitous here. The general idea is that the <def> defines the collocations that it contains, a bit like the following wikipedia example

<def>
In the English language, <colloc?>black sheep</colloc> is an idiom used to describe an odd or disreputable member of a group, especially within a family.
</def>
JessedeDoes commented 5 years ago

An option using nested entries would be something like this: (Katrien would disagree)

<entry>
<form type='lemma'>sheep</form>
    <entry type='mwe'>
      <def>
   In the English language, <form type='lemma'>black sheep</form> is an idiom used to describe an odd or disreputable member of a group, especially within a family.
</def>
   </entry>
</entry>
ttasovac commented 1 year ago

We see this more as a cit type example or cit type of some sort, and not a proper definition. The part "odd or disreputable member of a group, especially within a family" is a definition, but since it's part of this more narrative structure, we don't think the whole sentence should be encoded as a definition.