coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.
Apache License 2.0
30 stars 10 forks source link

UCCA postprocessing: details of MRP format #42

Closed namednil closed 5 years ago

namednil commented 5 years ago

The UCCA postprocessing doesn't produce the exact same format, the content looks the same but the information is encoded differently.

Examples: Output of post-processing: f1 MRP gold data: f2

namednil commented 5 years ago

@mariomgmn I pushed small changes to the script, so please pull before you modify something.

luciaelizabeth commented 5 years ago

How big of an issue is this? Personally I like our output better :-) (though sanwiches - ? )

Omri had written before when I asked about tokenization: "In order to address these non-uniformities (and the non-uniform tokenization between the meaning representations), the evaluation tool is designed to be somewhat forgiving to tokenization errors, and only penalizes such incongruencies at the anchoring level."

Does this help us? I.e., we have the anchors correct, so we should be okay.

namednil commented 5 years ago

It's an extremely important issue because we only get points for things that are "correct", what ever that means. However, it should be very easy to implement :) At least in the example above, tokenization is fine; the anchors, e.g. <0:5>, match perfectly.

mariomgmn commented 5 years ago

Come to think of it, this happened to me before when I tried to get dot formatted files by ids with mtool. I assumed it was just the dot file though. But yes, if this is a problem, it looks like it would be an easy fix :)

mariomgmn commented 5 years ago

@namednil, could you send me an example of our output and the MRP gold data?

namednil commented 5 years ago

Potentially, the problem never existed and I just messed something up. @mariomgmn can you take the contracted MRP graphs I sent you earlier today and run the post-processing and evaluate against gold data (where it is possible, otherwise exclude sentences) to see how much we currently lose using the UCCA pipeline?

This will also help reveal bugs (in particular in the post-processing step).

mariomgmn commented 5 years ago

Sure.

mariomgmn commented 5 years ago

I tried running it with the latest version of mtools but it threw an error. I then ran it with an older version that I had and this is what I got:

{"n": 30, "exact": 6, "tops": {"g": 30, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "labels": {"g": 0, "s": 437, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "properties": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "anchors": {"g": 441, "s": 437, "c": 336, "p": 0.7688787185354691, "r": 0.7619047619047619, "f": 0.7653758542141229}, "edges": {"g": 642, "s": 579, "c": 449, "p": 0.7754749568221071, "r": 0.6993769470404985, "f": 0.7354627354627356}, "attributes": {"g": 28, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "all": {"g": 1141, "s": 1453, "c": 785, "p": 0.5402615278733655, "r": 0.6879929886064855, "f": 0.605242868157286}, "time": 280.404636}

I find it surprisingly low. I'm running it with a trace now to see if there's a bug.

alexanderkoller commented 5 years ago

Hey hey, UCCA numbers. What exactly does this measure now? Is this the contracted and then uncontracted training data, compared against the original training data?

The number that seems to really hurt is that we get 437 system labels, but there were 0 gold labels. This seems to be the main reason why "all" is much less than the mean of "anchors" and "edges".

How can "anchors" be substantially less than 100%? What's going on there?

mariomgmn commented 5 years ago

Hey hey, UCCA numbers. What exactly does this measure now? Is this the contracted and then uncontracted training data, compared against the original training data?

Yes, as far as I understand it that's what's going on.

The label issue we already knew about but decided to keep them for readability if I recall correctly that it was a very easy fix. I'm not sure what's going on with the anchors though. I just ran the trace and I'm checking the sentences right now.

alexanderkoller commented 5 years ago

The interesting work will be in the edges, right? Then I would suggest fixing the labels and anchors so we can focus on the edges afterwards.

mariomgmn commented 5 years ago

Yes. For now, I looked at the bugs involving in the decontraction code and managed to find a few, note the higher edge scores. Additionally, in many of the cases we're getting lower than 1.0 due to remote edges. That said, there's still a stray bug somewhere that doesn't seem to appear very often, but I have a feeling I know where to look.

"properties": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "anchors": {"g": 441, "s": 437, "c": 337, "p": 0.7711670480549199, "r": 0.764172335600907, "f": 0.7676537585421411}, "edges": {"g": 642, "s": 593, "c": 551, "p": 0.9291736930860034, "r": 0.8582554517133957, "f": 0.8923076923076924}, "attributes": {"g": 28, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "all": {"g": 1141, "s": 1467, "c": 888, "p": 0.6053169734151329, "r": 0.7782646801051709, "f": 0.6809815950920245}

I'm not entirely sure what's happening with the anchors, but yeah, maybe we want to tackle that next.

alexanderkoller commented 5 years ago

This is excellent progress. Now the next thing, from where I stand, is to fix the anchors. Yes, I think the anchors are next, and probably comparatively easy (if tedious) to fix.

alexanderkoller commented 5 years ago

Can this issue be closed? If the training anchors are still an issue, we should make a new issue for that.