Closed mayhewsw closed 5 years ago
Can you check why the tests are failing? (or if they are due to the changes here?) http://morgoth.cs.illinois.edu:5800/viewLog.html?buildId=744&buildTypeId=CogcompNlp_Build&tab=buildLog&filter=err
Likely they are, since the master is passing the tests.
There's a subtle bug that happens in the ProtobufSerializerTest. When a TA is read back from protobuf, it refuses to add a constituent because it is duplicate, but this constituent participates in some relation. The null pointer comes because there's a relation that's missing one of the argument constituents.
The original TA comes from the DummyTextAnnotationGenerator. Presumably what happens is that two constituents are created in the SRL_VERB view, identical except for the relations they participate in. These can be added separately to the view, because relations are counted in equality check.
Then, when it is reloaded, the (identical) constituents are created, added (fail), then relations are given to them.
I think the issue now is with the DummyTextAnnotationGenerator SRL_VERB view generation.
It comes down to this: should constituent equality include relations (in or out). That is, imagine I have constituents A, B, and C. Consider the link A-B. Now I create a new constituent B' that is identical in every way to B, except it is not connected to A. Should B.equals(B') = true?
This is partly an issue of how SRL frames should be instantiated: if a single argument span occurs in multiple frames, should the annotator create multiple spans or a single span shared across frames? My feeling is that it should be a single span across frames. In this case, equality check should not include incoming/outgoing relations. @danyaljj any objections?
@mayhewsw Assuming @danyaljj is on board, please make the changes to DummyTextAnnotationGenerator
. We also need to make sure SRL annotators adhere to this constraint; perhaps using pipeline with an input that reliably generates multiple verb frames sharing arguments.
Good by me.
All tests pass. Changes are substantial, so please do take a look. In general, duplicate constituents are discouraged, but it is possible to force a view to accept duplicate constituents. For example, in a parse tree, one has the label level (with label . (period)) and the leaf, which is a child, has the same span and the same label. These are technically identical, but need to be represented separately.
One big change: equality of constituents does not take relations into account at all. I think this is the right way.
Two small comments; otherwise it looks good to me.
This addresses and closes #664