uniformize notation of polysemous and homonymous lexemes

leoalenc commented 3 years ago

Tenho adotado a convenção de diferenciar variantes de um determinado lexema (ou seja, o mesmo lexema com tipos diferentes) por meio da sufixação de _n ao lema, onde n é um índice que inicia em 1. Por exemplo, para dizer, temos no momento 4 variantes, correspondentes a diferentes valências:

dizer_1
dizer_2
dizer_3
dizer_4

No entanto, alguns lexemas fogem desse padrão:

exigir
exigir_2
exigir_3

O objetivo desta issue é uniformizar a notação.

arademaker commented 2 years ago

O que acha de especificar também a POS com dizer_v_1 como em https://github.com/delph-in/docs/wiki/ErgLeTypes?

leoalenc commented 2 years ago

O que acha de especificar também a POS com dizer_v_1 como em https://github.com/delph-in/docs/wiki/ErgLeTypes?

@arademaker, ótima ideia!

leoalenc commented 2 years ago

@arademaker, veja que o sistema da ERG é um pouco mais complicado do que parece à primeira vista:

$ grep -EhA 4 "^say(_[a-z]+[1-9])? " ~/trunk/*.tdl

say_n1 := n_-_mc-ntc_le &
 [ ORTH < "say" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_say_n_1_rel",
            PHON.ONSET con ] ].

--
say_v1 := v_np*_le &
 [ ORTH < "say" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_say_v_1_rel",
            PHON.ONSET con ] ].

say_v2 := v_pp*-cp_fin-imp_le &
 [ ORTH < "say" >,
   SYNSEM [ LKEYS [ --COMPKEY _to_p_sel_rel,
                    KEYREL.PRED "_say_v_to_rel" ],
            PHON.ONSET con ] ].
--
say_v3 := v_pp_arg_le &
 [ ORTH < "say" >,
   SYNSEM [ LKEYS [ --COMPKEY loc_abstr_rel,
                    KEYREL.PRED "_say_v_loc_rel" ],
            PHON.ONSET con ] ].
--
say_v4 := v_cp_fin-inf-q_le &
 [ ORTH < "say" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_say_v_to_rel",
            PHON.ONSET con ] ].

say_v5 := v_cp_inf-only_le &
 [ ORTH < "say" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_say_v_to_rel",
            PHON.ONSET con ] ].

Nem sempre o valor de PRED tem o índice numérico do identificador lexical (Lex id). Por exemplo, enquanto o PRED de say_v1 é "_say_v_1_rel", o de say_v4 e say_v5 é "_say_v_to_rel".

leoalenc commented 2 years ago

Outra coisa interessante da ERG, @arademaker, é que nem todo Lex id tem índice e alguns começam em 2, veja os exemplos abaixo. Sugestões a respeito da notação que devemos adotar são bem-vindas. Parece-me que fica mais coerente colocar sempre um índice, mesmo quando uma palavra, numa determinada classe gramatical, não apresentar variantes. Na ERG, fast como advérbio não tem índice. No entanto, como adjetivo, nome e verbo, tem, ainda que não existam variantes nesses casos.

grep -EhA 4 "^fast(_[a-z]+[1-9])? " ~/trunk/*.tdl

fast := av_-_i-vp-po_le &
 [ ORTH < "fast" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_fast_a_1_rel",
            PHON.ONSET con ] ].

fast_a1 := aj_-_i-er_le &
 [ ORTH < "fast" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_fast_a_1_rel",
            PHON.ONSET con ] ].

--
fast_n1 := n_-_c_le &
 [ ORTH < "fast" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_fast_n_1_rel",
            PHON.ONSET con ] ].

fast_v1 := v_-_le &
 [ ORTH < "fast" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_fast_v_1_rel",
            PHON.ONSET con ] ].

grep -EhA 4 "^today(_[a-z]+[1-9])? " ~/trunk/*.tdl

today_adv2 := av_-_i-vp-x_le &
 [ ORTH < "today" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_today_a_2_rel",
            PHON.ONSET con ] ].

today_adv3 := av_-_i-vp-x_le &
 [ ORTH < "to", "-", "day" >,
   SYNSEM [ LKEYS.KEYREL.PRED "_today_a_2_rel",
            PHON.ONSET con ] ].

--
today_np2 := n_-_ad-time_le &
 [ ORTH < "to", "-", "day" >,
   SYNSEM [ LKEYS [ ALT2KEYREL.PRED _today_a_1_rel,
                    KEYREL.PRED time_n_rel ],
            PHON.ONSET con ] ].

LR-POR / PorGram

uniformize notation of polysemous and homonymous lexemes #56