delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
77 stars 27 forks source link

simplemrs.encode() doesn't escape quotes properly #367

Closed EricZinda closed 1 year ago

EricZinda commented 1 year ago

simplemrs.encode(mrs) of the MRS for:

"Blue" is in this folder

Creates an MRS that can't be loaded by simplemrs.loads(). It fails because of the " characters:

File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/codecs/simplemrs.py", line 61, in loads
    ms = list(_decode(s.splitlines()))
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/codecs/simplemrs.py", line 174, in _decode
    yield _decode_mrs(lexer)
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/codecs/simplemrs.py", line 213, in _decode_mrs
    lexer.expect_type(RBRACK)
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/util.py", line 539, in expect_type
    return self.expect(*((arg, None) for arg in args), skip=skip)
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/util.py", line 508, in expect
    raise self._errcls('expected: ' + err,
delphin.mrs._exceptions.MRSSyntaxError: 
  line 1, character 4
    [ ""blue" is in this folder" TOP: h0 INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] RELS: < [ proper_q<0:6> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ] [ fw_seq<-1:-1> LBL: h7 ARG0: x3 ARG1: i8 ] [ quoted<1:5> LBL: h7 ARG0: i8 CARG: "blue" ] [ _in_p_loc<10:12> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x10 [ x PERS: 3 NUM: sg IND: + ] ] [ _this_q_dem<13:17> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] [ _folder_n_of<18:24> LBL: h14 ARG0: x10 ARG1: i15 ] > HCONS: < h0 qeq h1 h5 qeq h7 h12 qeq h14 > ]
        ^
MRSSyntaxError: expected: ]

Converting the string to:

'Blue' is in this folder

(single quotes) does round trip properly

goodmami commented 1 year ago

Ok, thanks @EricZinda, it looks like it is not escaping the quotes in the surface string on serialization. Should be an easy fix. Want to give it a shot?

goodmami commented 1 year ago

Want to give it a shot?

@EricZinda nevermind, I went ahead and fixed it. Try out v1.8.1 and let me know if it worked for you.

EricZinda commented 1 year ago

@goodmami thanks so much. v1.8.1 Works great!

arademaker commented 1 year ago

This is weird; I could not reproduce the error reported by @EricZinda in the previous version of PyDelphin.

from delphin import ace
from delphin.codecs import simplemrs, mrx
response = ace.parse('erg.dat', '"Blue" is in this folder')
m = response.result(1).mrs()
print(simplemrs.encode(m, indent=True), file = open("lixo.txt", "w"))
a = open('lixo.txt').read()
m1 = simplemrs.loads(a)[0]
print(simplemrs.encode(m1, indent=True))

No error!

goodmami commented 1 year ago

@arademaker The surface field of the MRS being populated depends on how ACE is invoked. If you use the standard ACE interface at the command line and use PyDelphin to convert it, you should see it:

$ ace -g ../erg-2018.dat -1 <<< "\"Blue\" is in this folder." | delphin convert -f ace --color=never
NOTE: 1 readings, added 1694 / 597 edges to chart (305 fully instantiated, 101 actives used, 180 passives used) RAM: 5835k
NOTE: parsed 1 / 1 sentences, avg 5835k, time 0.02512s
[ ""Blue" is in this folder."
  TOP: h0
  INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ]
  RELS: < [ udef_q<0:6> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ]
          [ _blue_a_1<0:6> LBL: h7 ARG0: x3 ARG1: i8 ]
          [ _in_p_loc<10:12> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x9 [ x PERS: 3 NUM: sg IND: + ] ]
          [ _this_q_dem<13:17> LBL: h10 ARG0: x9 RSTR: h11 BODY: h12 ]
          [ _folder_n_of<18:25> LBL: h13 ARG0: x9 ARG1: i14 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]

In this case, PyDelphin looks for the SENT: line generated by ACE and uses it to fill the surface field. When you use the ACE module in Python, it defaults to ACE's --tsdb-stdout mode, which might not report the :surface field, in which case PyDelphin would not be able to populate the MRS structure with the information.