DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Encoding portmanteau forms #78

Open phoenix-mossimo opened 4 years ago

phoenix-mossimo commented 4 years ago

Is there a LEX-0 conform way to encode the portmanteau forms?

In Coptic dictionary we have a number of cases when one indivisible form corresponds to what normally constitutes two grammatical categories with overlapping properties. For example, mono-morphemic possessive pronouns are originally poly-morphemic, composed of two roots: demonstrative + (more ancient) possessive pronoun (i.e. "your house" is actually "this" + "your" + "house"). In Coptic both roots are still clearly distinguishable, but the grammaticalization process has embraced 1st and 2nd person which has fused into one form, (i.e. creating a kind of "thior" house.)

As fused morphs belong to different grammatical categories we decided to create nested tags, which encode separately possessor and possessum information, e.g.:

<gramGrp> <gramGrp><pos>Demonstrative</pos><gen>m.</gen><number>sg.</number></gramGrp> <gramGrp><pos>Possessive suffix pronoun</pos><subc>2. Pers.</subc><gen>f.</gen><number>sg.</number></gramGrp> </gramGrp>

Is this OK from the LEX-0 point of view?

iljackb commented 4 years ago

In TEI Lex-0 Etym (still to be published), we would do this in etymology.

Here's an example with "Brexit"

     <entry xml:lang="en">
        <form type="lemma">
           <orth>Brexit</orth>
        </form>
        <gramGrp>
           <gram type="pos">noun</gram>
        </gramGrp>
        <sense>
           ....
        </sense>
        <etym type="portmanteau">
           <lbl>portmanteau of</lbl>
           <cit type="etymon">
              <form><orth>Britain</orth></form>
              <gramGrp>
                 <gram type="pos">noun</gram>
              </gramGrp>
           </cit>
           <pc>+</pc>
           <cit type="etymon">
              <form><orth>exit</orth></form>
              <!-- add pron? -->
              <gramGrp>
                 <gram type="pos">verb</gram>
              </gramGrp>
           </cit>
           <seg type="desc">Formed by analogy with the earlier coined

term (but unrealized event) of <xr type="crossReference">Grexit

        </etym>
     </entry>

I guess an important issue is are you encoding a print source (and thus need to keep the original ordering or content) or is it a born digital source where you can chose how to structure it? If it's the first and you need to present that info how it is in your example, nesting has never been discussed in Lex0 (as far as I can remember) and is pretty unorthodox in TEI in general, but if you need to keep that way, I'd change the use of , eg:

Demonstrativem.sg. Possessive suffix pronoun*2. Pers.* f.sg.

Also in Lex-0 we use instead of the specific tags (e.g. <gram type="pos">, , etc..)

On Tue, Jan 28, 2020 at 7:16 PM Maxim Kupreyev notifications@github.com wrote:

Is there a LEX-0 conform way to encode the portmanteau forms?

In Coptic dictionary we have a number of cases when one indivisible form corresponds to what normally constitutes two grammatical categories with overlapping properties. For example, mono-morphemic possessive pronouns are originally poly-morphemic, composed of two roots: demonstrative + (more ancient) possessive pronoun (i.e. "your house" is actually "this" + "your"

  • "house"). In Coptic both roots are still clearly distinguishable, but the grammaticalization process has embraced 1st and 2nd person which has fused into one form, (i.e. creating a kind of "thior" house.)

As grammatical information contained in fused morphs might be overlapping (i.e. possessor is feminine, possessed is masculine) we decided to create the nested tags, which encode separately possessor and possessum, e.g.:

Demonstrativem.sg. Possessive suffix pronoun2. Pers.f.sg.

Is this OK from the LEX-0 point of view?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DARIAH-ERIC/lexicalresources/issues/78?email_source=notifications&email_token=ABYQ2HHZW4J75FTSUJBIG3DRABY6XA5CNFSM4KMW7JEKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IJJXKRA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYQ2HB3N5MYCURFWDU54HLRABY6XANCNFSM4KMW7JEA .

phoenix-mossimo commented 4 years ago

Hm, <etym> section is certainly an option, but the problem is that the discrepancy in grammatical information applies to the current form. In your example "Brexit" is clearly a noun <gram type="pos">noun</gram>, which derives from verb and noun (encoded in <etym> section). In our case "pa" belongs to two grammatical categories - it is a demonstrative and a possessive pronoun.