Use original strings in "input" field of parser outputs

coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.

Apache License 2.0

30 stars 10 forks source link

Use original strings in "input" field of parser outputs #50

Closed alexanderkoller closed 5 years ago

alexanderkoller commented 5 years ago

After further discussion on https://github.com/cfmrp/mtool/issues/64, the correct thing to do is to use the original value of the "input" field in the MRP files the parser produces. The bug in #48 suggests that at some point(s?) in our code, we simply concatenate the tokens in the companion data with spaces and put those in the "input" field. This is incorrect and dangerous.

We need to make sure that we use the correct strings in the "input" fields.

alexanderkoller commented 5 years ago

@namednil check other corpora, then close.

namednil commented 5 years ago

It indeed works for UCCA, I still have to check the other corpora.

namednil commented 5 years ago

Since we don't carry around the input for AMR, I decided to restore the "input" field at evaluation time. By giving the command line argument "--input path/to/input.mrp" to EvaluateAMR (or EvaluateMRP) we now enter the correct value there and -- for formalisms with anchoring -- remove illegal anchors, printing a warning that we did so.