delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
77 stars 27 forks source link

Quote predicates containing reserved characters in SimpleMRS #372

Closed goodmami closed 8 months ago

goodmami commented 8 months ago

From https://github.com/delph-in/pydelphin/issues/371#issuecomment-1806904088

In SimpleMRS, when a string-pred containing reserved characters (whitespace, <, [, etc.) is read in, it will be serialized without quotes, leading to a form that can't be decoded again.

>>> m = simplemrs.decode('[RELS: < [ "foo bar" LBL: h0 ARG0: e2 ] >]')
>>> print(simplemrs.encode(m))
[ RELS: < [ foo bar LBL: h0 ARG0: e2 ] > ]
>>> simplemrs.decode(simplemrs.encode(m))
[...]
delphin.mrs._exceptions.MRSSyntaxError: 
    [ RELS: < [ foo bar LBL: h0 ARG0: e2 ] > ]
                    ^
MRSSyntaxError: expected: a feature

On serialization, the predicates should be checked for the presence of such characters. If the characters are present, the predicate should be quoted.