delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
79 stars 27 forks source link

Don't crash on bad input when converting from testsuite #200

Closed arademaker closed 5 years ago

arademaker commented 5 years ago

After creating a testsuite, I tried to export dmrs representations of a set of sentences with the command:

$ delphin convert --pretty-print -t simpledmrs current/

But I got the following error:

/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h117) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h121) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Xmrs structure is not connected.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h26 qeq h118) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h26 qeq h119) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h26 qeq h114) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h21 qeq h60) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h20 qeq h61) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: e29 is the intrinsic variable for more than one EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h29 qeq h124) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h29 qeq h125) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h41 qeq h51) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h28 qeq h57) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: x115 is the bound variable for more than one quantifier.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h110) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h115) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h22 qeq h110) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h50 qeq h103) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h50 qeq h107) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h60 qeq h202) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h75) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: x65 is the bound variable for more than one quantifier.
Lo variable of HCONS (h16 qeq h79) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h70) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h76) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h74) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h63 qeq h143) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h57 qeq h137) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h67 qeq h141) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h16 qeq h71) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h22 qeq h75) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Lo variable of HCONS (h22 qeq h71) is not the label of any EP.
  warn(str(ex), XmrsWarning)
Traceback (most recent call last):
  File "/usr/local/bin/delphin", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/delphin/main.py", line 35, in main
    args.func(args)
  File "/usr/local/lib/python3.7/site-packages/delphin/main.py", line 62, in call_convert
    predicate_modifiers=args.predicate_modifiers))
  File "/usr/local/lib/python3.7/site-packages/delphin/commands.py", line 94, in convert
    return dumps(xs, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/simpledmrs.py", line 56, in dumps
    return serialize(ms, properties=properties, indent=kwargs.get('indent'))
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/simpledmrs.py", line 90, in serialize
    return delim.join(_encode_dmrs(m, properties, indent=indent) for m in ms)
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/simpledmrs.py", line 90, in <genexpr>
    return delim.join(_encode_dmrs(m, properties, indent=indent) for m in ms)
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/simpledmrs.py", line 123, in _encode_dmrs
    for l in links(m):
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/components.py", line 313, in links
    lblheads = {v: lsh(v) for v, vd in _vars.items() if 'LBL' in vd['refs']}
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/components.py", line 313, in <dictcomp>
    lblheads = {v: lsh(v) for v, vd in _vars.items() if 'LBL' in vd['refs']}
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/xmrs.py", line 524, in labelset_heads
    scope_sets[nid] = _ivs_in_scope(nid, _eps, _vars, _hcons)
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/xmrs.py", line 1219, in _ivs_in_scope
    ivs.update(_ivs_in_scope(conj_nid, _eps, _vars, _hcons))
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/xmrs.py", line 1219, in _ivs_in_scope
    ivs.update(_ivs_in_scope(conj_nid, _eps, _vars, _hcons))
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/xmrs.py", line 1219, in _ivs_in_scope
    ivs.update(_ivs_in_scope(conj_nid, _eps, _vars, _hcons))
  [Previous line repeated 982 more times]
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/xmrs.py", line 1215, in _ivs_in_scope
    elif var_sort(val) == HANDLESORT:
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/components.py", line 58, in var_sort
    return sort_vid_split(v)[0]
  File "/usr/local/lib/python3.7/site-packages/delphin/mrs/components.py", line 41, in sort_vid_split
    match = var_re.match(vs)
RecursionError: maximum recursion depth exceeded while calling a Python object

Any idea?

goodmami commented 5 years ago

You're getting a lot of warnings about malformed scope trees in the MRSs, and based on the location of the error I suspect you have a structure with a cycle in the scope "tree". These are malformed MRSs, so there may be a bug in the ERG (though I would check if it's still problematic with the 2018 version of the ERG before reporting it). However, PyDelphin should check for cycles when in this recursive _ivs_in_scope() function.

Can you provide the MRS that causes the issue?

goodmami commented 5 years ago

@arademaker are you able to find the MRS that causes the error? I'm unable to reproduce even with this obvious cycle in a made-up MRS:

[ LTOP: h0
  RELS: < [ foo<0:3> LBL: h1 ARG0: e2 ARG1: h3 ]
          [ bar<4:7> LBL: h4 ARG0: e5 ARG1: h6 ] >
  HCONS: < h0 qeq h1 h3 qeq h4 h6 qeq h1 > ]
arademaker commented 5 years ago

@goodmami sorry for this late reply. I am trying to reproduce the error. You are probably following the discussion at https://delphinqa.ling.washington.edu/t/error-in-processing-profiles-using-art-for-treebanking/245, right? I am using http://sweaglesw.org/linguistics/libtsdb/art for creating a profile with 5,602 sentences. It is still running.

With a profile from a sample of 866 sentences with <= 100 characters, I was able to finished to create the profile and I got some errors with the command that start this issue.

$ delphin convert --pretty-print -t simpledmrs repsol-100
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: Lo variable of HCONS (h0 qeq h1) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: Xmrs structure is not connected.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: Lo variable of HCONS (h9 qeq h45) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: x45 is the bound variable for more than one quantifier.
Lo variable of HCONS (h47 qeq h55) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: x43 is the bound variable for more than one quantifier.
Lo variable of HCONS (h45 qeq h53) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: Lo variable of HCONS (h10 qeq h51) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: Lo variable of HCONS (h9 qeq h25) is not the label of any EP.
  warn(str(ex), XmrsWarning)
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:246: XmrsWarning: Lo variable of HCONS (h44 qeq h46) is not the label of any EP.
  warn(str(ex), XmrsWarning)
...

I am attaching the file if you want to reproduce the error.

repsol-100.tar.gz

goodmami commented 5 years ago

Thanks for the update. I don't have a lot of time in the next few weeks, but if I get a chance I'll try to determine if there's a bug in PyDelphin or something it can help with. Those messages you're getting are "warnings" because PyDelphin attempts to read and model the MRSs anyway, but they are actually indicating malformed MRSs, so there may be limits to the kinds of processing that can be done to them.

goodmami commented 5 years ago

@arademaker Sorry for the wait, I am now getting around to this issue. When I try to convert the repsol-100 profile, I see the warnings you posted in the previous message, and after the last warning (about h44 qeq h46) I get a KeyError: 'e2', which is different from the error you posted in the first message. These represent different kinds of malformedness in the MRSs, but for conversion it should be the same fix: I should catch, log, and move on when encountering errors in batch conversion.

goodmami commented 5 years ago

205 appears to be a duplicate of this, but I can separate the two issues so the other targets the cause of the error and this one is now about making conversion more robust in general

arademaker commented 5 years ago

Hi, sorry for my silence... I am trying to reproduce the error and find at least one sentence that produces the warnings. I had these warnings with much simpler sentences. Problem is that if I try to loop over a list of sentences and print the sentence text before the MRS, the warning messages printed in the STDOUT was delayed, so I can't sync the print of the sentence text with the warnings.

arademaker commented 5 years ago

I found using simplemrs.loads(..., single=True, errors = 'strict') a quite small sentence with one of the errors listed above, the Xmrs structure is not connected. case.

A man is untying a shoe

NOTE: parsed 1 / 1 sentences, avg 9018k, time 0.04160s
/usr/local/lib/python3.7/site-packages/delphin/mrs/simplemrs.py:248: XmrsWarning: Xmrs structure is not connected.
  warn(str(ex), XmrsWarning)
_un-_a_rvrs
[ TOP: h0
  INDEX: e2
  RELS: < [ _a_q<0:1> LBL: h4 ARG0: x3 RSTR: h5 BODY: h6 ]
          [ _man_n_1<2:5> LBL: h7 ARG0: x3 ]
          [ _tie_v_1<9:16> LBL: h1 ARG0: e8 ARG1: i9 ARG2: x10 ]
          [ _un-_a_rvrs<9:16> LBL: h1 ARG0: e2 ARG1: e8 ]
          [ _a_q<17:18> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]
          [ _shoe_n_1<19:23> LBL: h14 ARG0: x10 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 h12 qeq h14 > ]
dmrs {
  [top=10002 index=10003]
  10000 [_a_q<0:1> x PERS=3 NUM=sg IND=+];
  10001 [_man_n_1<2:5> x PERS=3 NUM=sg IND=+];
  10002 [_tie_v_1<9:16> e SF=prop TENSE=pres MOOD=indicative PROG=+ PERF=-];
  10003 [_un-_a_rvrs<9:16> e SF=prop TENSE=pres MOOD=indicative PROG=+ PERF=-];
  10004 [_a_q<17:18> x PERS=3 NUM=sg IND=+];
  10005 [_shoe_n_1<19:23> x PERS=3 NUM=sg IND=+];
  10000:RSTR/H -> 10001;
  10002:ARG2/NEQ -> 10005;
  10003:ARG1/EQ -> 10002;
  10004:RSTR/H -> 10005;
}
{e8:
 _1:_a_q<0:1>[BV x3]
 x3:_man_n_1<2:5>[]
 e8:_tie_v_1<9:16>[ARG2 x10]
 e2:_un-_a_rvrs<9:16>[ARG1 e8]
 _2:_a_q<17:18>[BV x10]
 x10:_shoe_n_1<19:23>[]
}
arademaker commented 5 years ago

The LKB_FOS system gives me also the same error. The _man_n_1 ARG0 is disconnected from the rest of the graph, the ARG1 of _tie_v__1 is an i? variable.

Screenshot 2019-03-21 17 18 29
goodmami commented 5 years ago

Thanks for the example. There's only so much PyDelphin can do with disconnected graphs, but if it cannot convert one item in a set it should still be able to continue to the next (which is what this bug is now about).

For the problematic analysis I suggest filing a bug with the ERG: https://github.com/delph-in/erg/issues. Dan does check that list from time to time. Be sure to indicate which version of the ERG (2018?) you are using.

arademaker commented 5 years ago

The results I got with the trunk version of ERG. Yes, this issue now is related to the robustness of Pydelphin only.