delph-in / pydmrs

A library for manipulating DMRS structures
MIT License
14 stars 6 forks source link

`_rel` suffixes in pred strings #19

Closed guyemerson closed 8 years ago

guyemerson commented 8 years ago

So I believe Ann would prefer that we don't keep the _rel suffixes in pred strings. It would not be hard to change this (and update the unit tests) - but would this break anyone's code if it were changed now?

AlexKuhnle commented 8 years ago

I will change my code accordingly today if necessary, but think it's a good idea, so go ahead!

Guy Emerson notifications@github.com wrote:

So I believe Ann would prefer that we don't keep the _rel suffixes in pred strings. It would not be hard to change this (and update the unit tests) - but would this break anyone's code if it were changed now?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

fcbond commented 8 years ago

This will break for grammars where not all preds end in '_rel' (such as Jacy).

I am prepared to fix Jacy :-), this may be an issue in other people's grammars as well, but maybe we should enforce this (or just strip _rel off everything).

On Thu, Mar 17, 2016 at 5:11 PM, Alex Kuhnle notifications@github.com wrote:

I will change my code accordingly today if necessary, but think it's a good idea, so go ahead!

Guy Emerson notifications@github.com wrote:

So I believe Ann would prefer that we don't keep the _rel suffixes in pred strings. It would not be hard to change this (and update the unit tests) - but would this break anyone's code if it were changed now?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/delph-in/pydmrs/issues/19#issuecomment-197778064

Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

anncopestake commented 8 years ago

I guess the proposal is just to strip off the _rel when it is there?
That is what the MRS->RMRS code does in the LKB from what I remember - then the DMRS code just took over the predicate representation from RMRS.

On 17/03/2016 09:16, Francis Bond wrote:

This will break for grammars where not all preds end in '_rel' (such as Jacy).

I am prepared to fix Jacy :-), this may be an issue in other people's grammars as well, but maybe we should enforce this (or just strip _rel off everything).

On Thu, Mar 17, 2016 at 5:11 PM, Alex Kuhnle notifications@github.com wrote:

I will change my code accordingly today if necessary, but think it's a good idea, so go ahead!

Guy Emerson notifications@github.com wrote:

So I believe Ann would prefer that we don't keep the _rel suffixes in pred strings. It would not be hard to change this (and update the unit tests) - but would this break anyone's code if it were changed now?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/delph-in/pydmrs/issues/19#issuecomment-197778064

Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/delph-in/pydmrs/issues/19#issuecomment-197780835

guyemerson commented 8 years ago

To clarify - we can process strings both with and without _rel. At the moment, if you input a pred as a string in either form, we construct the same object, but if you then ask for the string version of that pred, we always return it with a _rel suffix. What I'm asking is if we should return it without the _rel.

goodmami commented 8 years ago

There was a discussion on the developers list about these. The _rel suffix is basically just to carve out a namespace for predicates (so predicates don't clash with grammar types in the hierarchy), and otherwise few had any qualms with removing them. If we keep the _rel suffix in the grammar but strip it on *MRS serialization, then generation becomes more difficult because we have to search for matching predicates by optionally re-adding the suffix. E.g., Jacy has _rel-less predicates like coord (i.e. not coord_rel), so for these you don't want to append the suffix, but for most others you do. Things would be even more difficult if there were both variants (coord and coord_rel) in the grammar, because then stripping _rel confuses the two, but luckily neither the ERG nor Jacy have this situation (but I haven't checked other grammars). Stephan suggested a user-configurable suffix (so you could have _relation or what have you), but I think this would make things worse.

If we are prepared to say that a grammar that doesn't consistently use _rel is a broken grammar, then we can rest assured that always stripping during serialization (and appending during deserialization) is the correct way to do things. Jacy has other issues with predicates that need fixing, so making Jacy conformant WRT _rel isn't a worry.

BTW if we want to fully normalize preds, we also want to strip quotes and possibly case-normalize the string.

FWIW, in pyDelphin I chose to store the exact form that the predicate is read in, but I have a function for normalizing to the short-form, which is used for visualization (e.g. in Demophin), but for serialization I use the original form when possible.

Maybe this is all more than you wanted to know. If you're just wondering if you should always print preds with _rel or always without _rel, I think always printing them without _rel is preferable, since that's a familiar transformation. If you add _rel to something that didn't initially have it (like Jacy's coord, person, adversative, and so on), that would be more confusing, I think.

guyemerson commented 8 years ago

Thanks for everyone's comments. I think the conclusion is to return pred strings without the _rel suffix, and I've updated the code accordingly.