delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
79 stars 27 forks source link

Function for iterating over scope readings #221

Open goodmami opened 5 years ago

goodmami commented 5 years ago

211 describes a module for scope operations. The current implementation for v1.0.0 satisfies the goals complete except for a function to iterate over scope readings. I'm creating this issue for that last function, in case it needs to be delayed to a later release.

arademaker commented 3 years ago

Not sure if I understood what '...a function to iterate over scope readings..' means. The code for generating scoped MRSs from a given MRS is already in the library or not ? Sorry, I didn't get.

goodmami commented 3 years ago

A non-scope-resolved structure is ambiguous over one or more scope-resolved structures. To iterate over scope readings is to expand the ambiguous structure to those resolved structures and iterate over them (e.g., in a for-loop, or the function could just return a list of scope-resolved MRSs).

As the text of this issue states, a function to do this is not yet present in the library. You can see the documentation for the delphin.scope module to see what is currently available.

arademaker commented 3 years ago

Yes, the first part was already completely clear to me. The second part is not so much. Actually, the comment in https://github.com/delph-in/pydelphin/issues/241 is not clear at all.

it looks like the actual code for producing all possible scoped trees is not trivial. One naive approach is to collect all holes and handles. Produce all possible permutations of them, filtering the ones that violate any HCONS constraint. But we need to check all predications on each LABEL/scope and also consider that no variable can be 'used' before its introduction by a quantifier.

goodmami commented 3 years ago

Yes, the first part was already completely clear to me. The second part is not so much.

Maybe some pseudocode would help?

>>> m = simplemrs.load(...)         # an MRS with underspecified quantifier scope
>>> for _m in scope.resolve(m):     # iterate over scope readings (this issue)
>>>     print(simplemrs.dumps(_m))  # prints the scope-resolved MRS

Actually, the comment in #241 is not clear at all.

Well, #241 is really about representing the underspecified scopal tree fragments than resolving the structures, so maybe it's less relevant for you.

it looks like the actual code for producing all possible scoped trees is not trivial. One naive approach is [...]

It's been a while since I've attempted to write the function, but I recall that naively enumerating all and checking for constraint violations is intractable (for similar reasons this issue describes the function as "iterating" over readings instead of returning a list; it would take too much time and memory to construct the full list for longer sentences). It's better to get the partial orderings from qeqs, etc., first and then only enumerate the feasible ones.

arademaker commented 3 years ago

I am trying to make sense from the LKB code, but not trivial! Hope that eventually @john-a-carroll can help me to read it. It seems that some extra checks for implicit existentials may pre-date some new expected conditions of MRSs

arademaker commented 3 years ago

https://www.coli.uni-saarland.de/projects/chorus/utool/ can be integrated with LKB via server-mode. That could be a good starting point for PyDelphin too, don't you think?

goodmami commented 3 years ago

Possibly, yes, but I'm concerned about two things:

arademaker commented 3 years ago
  1. yes, I just tested
  2. I need some answers from the Utool authors before starting to draw an idea of code

For the XML encoding of MRS, I got

% java -jar ~/Downloads/Utool-3.1.1.jar solve test.mrs.xml -O term-prolog
A semantic error occurred while decoding the graph.
The graph is not leaf-labelled.

I don't know what is a leaf-labelled graph. In the Prolog output, I found that I need to remove the ICONS to allow Utool to parse it.

There are many possible codecs for output, the term-prolog is:

['_a_q'('_book_n_of','_every_q'('_student_n_of','_read_v_1')),
'_every_q'('_student_n_of','_a_q'('_book_n_of','_read_v_1'))] 

So a complete solution would need to still recover the variables introduced by each generalized quantifier and the arguments for each predicate.

arademaker commented 3 years ago

BTW, note that the example above is simple, all RSTR from the quantifiers has only noun predicates. But for a more complicated sentence:

udef_q('_cream_n_1',
       '_a_q'(udef_q('_milk_n_1',
             udef_q('_churn_v_1&_or_c',
                udef_q('_fat_a_1&_globules/nns_u_unknown&_make_v_1',
                   '_edible_a_1&_emulsion_n_1&_of_p'))),
          unknown))

I will need to deal with non-trivial RSTRs with e-variables from adjectives and verb predications.

oepen commented 3 years ago

the LKB actually has code to interface with utool, though quite possibly that is only active in the LOGON builds. if so, it should not be too hard to tweak the +:logon conditional compilation features to also activate it in the FOS universe. @arademaker, you can probably find the utool interface code ... it might well be in the MT package.

goodmami commented 3 years ago

Thanks for testing. I'm not sure about the term-prolog codec, but maybe it could be added.

If it accepts the MRS-Prolog codec output without ICONS, then another problem is that PyDelphin doesn't currently let you get rid of ICONS from the delphin convert command but you can when scripting (scroll to the end to see the difference):

>>> from delphin.codecs import simplemrs, mrsprolog
>>> m = simplemrs.load('foo.mrs')
>>> print(mrsprolog.encode(m[0]))
psoa(h0,e2,[rel('unknown',h1,[attrval('ARG',x4),attrval('ARG0',e2)]),rel('_the_q',h5,[attrval('ARG0',x4),attrval('RSTR',h6),attrval('BODY',h7)]),rel('compound',h8,[attrval('ARG0',e9),attrval('ARG1',e10),attrval('ARG2',x11)]),rel('udef_q',h12,[attrval('ARG0',x11),attrval('RSTR',h13),attrval('BODY',h14)]),rel('_cat_n_1',h15,[attrval('ARG0',x11)]),rel('_chase_v_1',h8,[attrval('ARG0',e10),attrval('ARG1',i16),attrval('ARG2',x4)]),rel('generic_entity',h8,[attrval('ARG0',x4)]),rel('card',h8,[attrval('ARG0',e18),attrval('ARG1',x4),attrval('CARG','1')])],hcons([qeq(h0,h1),qeq(h6,h8),qeq(h13,h15)]),icons([topic(e10,x4)]))
>>> m[0].icons.clear()
>>> print(mrsprolog.encode(m[0]))
psoa(h0,e2,[rel('unknown',h1,[attrval('ARG',x4),attrval('ARG0',e2)]),rel('_the_q',h5,[attrval('ARG0',x4),attrval('RSTR',h6),attrval('BODY',h7)]),rel('compound',h8,[attrval('ARG0',e9),attrval('ARG1',e10),attrval('ARG2',x11)]),rel('udef_q',h12,[attrval('ARG0',x11),attrval('RSTR',h13),attrval('BODY',h14)]),rel('_cat_n_1',h15,[attrval('ARG0',x11)]),rel('_chase_v_1',h8,[attrval('ARG0',e10),attrval('ARG1',i16),attrval('ARG2',x4)]),rel('generic_entity',h8,[attrval('ARG0',x4)]),rel('card',h8,[attrval('ARG0',e18),attrval('ARG1',x4),attrval('CARG','1')])],hcons([qeq(h0,h1),qeq(h6,h8),qeq(h13,h15)]))

In several codecs (or all? I forget), newer things like ICONS are only printed if they are non-empty. This use case might suggest a --no-icons option to the delphin convert command would be useful (currently there is --no-properties and --no-lnk).

arademaker commented 3 years ago

Thank you @goodmami and @oepen. Indeed, the hard part is now to produce an MRS encoding readable by utool, but may be the post processing of utool's output for recover the predication arguments and variables for the quantifies.

I will check lkb code to understand the integration with utool, but I am curious about what utool does different (performance?) than the native lkb lisp code for producing the scoped resolved MRSs.