DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Numbering grammatical homonyms #50

Closed anacastrosalgado closed 4 years ago

anacastrosalgado commented 5 years ago

In Portuguese Academy Dictionary, I have grammatical homonyms and I have to number the entry.

What's the best way to encode the number? Using <lbl> or <num>?

capital:1 adj. m. e f. capital:2 n. f.

     <entry xml:id="DACL.CAPITAL:1" xml:lang="pt">
               <form type="lemma">
                  <orth>capital</orth>
                  <lbl>:1</lbl>
   <entry xml:id="DACL.CAPITAL:2" xml:lang="pt">
               <form type="lemma">
                  <orth>capital</orth>
                  <lbl>:2</lbl>
xlhrld commented 5 years ago

As this really serves as a label that just happens to come in the form of a number (could be a letter or some other character as well) I'd opt for lbl like you propose. I'd separate the punctuation from it with pc, though. I'd also try to keep the whitespace in sync with the print but YMMV:

<form type="lemma">
  <orth>capital</orth><pc>:</pc>
  <lbl>1</lbl>
  <gramGrp>
    <gram type="partOfSpeech">adj.</gram>
    <gram type="gender">m. e f.</gram>
  </gramGrp>
</form>

Using form/@n="1" for the homonym marker may also be an option if you do not necessarily have or want to keep it as a text node.

anacastrosalgado commented 5 years ago

@xlhrld I agree with you. It could be a letter, a symbol or some other character. @ttasovac, do you agree on using lbl?

I'm not using , but I'll change it. Thanks.

anacastrosalgado commented 5 years ago

Like this? If the purpose is to simplify, I think I'm complicating.

capital (homonymicEntry and it's a simple word)

<entry type =homonymicEntry" xml:id="DACL.CAPITAL:1" xml:lang="pt"> <form type="lemma"> <orth norm="capital"> <w type="simple">capital</w> <orth>capital</orth><pc>:</pc> <lbl>1</lbl> <pron>kɐpitˈał</pron> <gram type="partOfSpeech">adj.</gram> <gram type="gender">m. e f.</gram> </gramGrp> </form>

TomazErjavec commented 5 years ago

Hi, my 2c:

In short, I'd propose:

<entry type="homonymicEntry" n="1" xml:id="DACL.CAPITAL:1" xml:lang="pt"> 
  <form type="lemma">
       <orth><w type="simple">capital</w></orth>
       <pc>: </pc> <lbl type="homonymNumber>1</lbl> 
        ...
bansp commented 5 years ago

Minor remark: the ID must be "non-colonized", so the ":" should cause an error. Use a hyphen instead. And as Tomaž says, there's a bit of ill-formedness above, one closing, and one opening tag missing.

anacastrosalgado commented 4 years ago

In Portuguese Academy Dictionary, it is encoded as the attribute n (number) in the entry element:

<entry type="derivativeWord" xml:lang="pt" xml:id="antepassado.1" n="1">