delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
77 stars 27 forks source link

format function for TypeDefinition seems to mess up some of the Lists #357

Closed olzama closed 1 year ago

olzama commented 1 year ago

If I iterparse this TypeDefinition

main-vprn := basic-main-verb & norm-pronominal-verb &
  [ SYNSEM.LOCAL.CAT.VAL [ SUBJ < #subj >, 
                           COMPS < #comps >,
                           CLTS #clt ],
    ARG-ST < #subj . < #comps . #clt > > ].

and then print it back out using the format function, I get this:

main-vprn := basic-main-verb & norm-pronominal-verb &
  [ SYNSEM.LOCAL.CAT.VAL [ SUBJ < #subj >,
                           COMPS < #comps >,
                           CLTS #clt ],
    ARG-ST < #subj, #comps . < #comps . #clt > > ].     <----- Note the extra `,#comps`which used to not be there before
olzama commented 1 year ago

Repro:

from delphin import tdl as pydelphin_tdl

for event, obj, lineno in pydelphin_tdl.iterparse('debug.txt'):
    print(pydelphin_tdl.format(obj))

debug.txt

goodmami commented 1 year ago

Ok thanks, I've confirmed the bug. I'm trying to figure out what to do about it. In cons-lists, the final item being delimited by a dot . indicates that it should be the value of the final REST instead of null. E.g.,

< a, b >   -->  [ FIRST a, REST [ FIRST b, REST *null* ]]
< a . b >  -->  [ FIRST a, REST b ]

For debugging I modified your debug.txt as follows, where the second type has the list defined using a comma instead of a dot with a nested list:

main-vprn := basic-main-verb & norm-pronominal-verb &
  [ SYNSEM.LOCAL.CAT.VAL [ SUBJ < #subj >,
                           COMPS < #comps >,
                           CLTS #clt ],
    ARG-ST < #subj . < #comps . #clt > > ].

main-vprn2 := basic-main-verb & norm-pronominal-verb &
  [ SYNSEM.LOCAL.CAT.VAL [ SUBJ < #subj >,
                           COMPS < #comps >,
                           CLTS #clt ],
    ARG-ST < #subj , #comps . #clt > ].

The first thing I notice is that the value of REST in the original version is another ConsList because it parsed the < character, but in the modified version it's just a plain AVM:

>>> from delphin import tdl
>>> orig, mod = [obj for _, obj, _ in tdl.iterparse('debug.txt')]
>>> orig['ARG-ST'].features()
[('FIRST', <Coreference object at 139968742236480>), ('REST', <ConsList object at 139968744179648>)]
>>> mod['ARG-ST'].features()
[('FIRST', <Coreference object at 139968742237344>), ('REST', <AVM object at 139968740017344>)]

The features of these AVMs are the same, though (I need to cast the Coreference objects to strings because equality is not defined for Coreference objects):

>>> str(orig['ARG-ST.FIRST']) == str(mod['ARG-ST.FIRST'])
True
>>> str(orig['ARG-ST.REST.FIRST']) == str(mod['ARG-ST.REST.FIRST'])
True
>>> str(orig['ARG-ST.REST.REST']) == str(mod['ARG-ST.REST.REST'])
True

Also note that PyDelphin has no problem formatting the modified version:

>>> print(tdl.format(mod))
main-vprn2 := basic-main-verb & norm-pronominal-verb &
  [ SYNSEM.LOCAL.CAT.VAL [ SUBJ < #subj >,
                           COMPS < #comps >,
                           CLTS #clt ],
    ARG-ST < #subj, #comps . #clt > ].

I think PyDelphin should be able to recreate the dotted form since it knows the value of REST is a ConsList.