Keep track of reference chain in MuDA

nightingal3 commented 2 years ago

Currently, we track whether a sentence has any links to the previous sentence with booleans. If we could keep track of these references explicitly, we could automatically change them to create contrastive datasets for certain phenomena to measure a model's context sensitivity. This could be difficult to do, but would remove the dependence on non-contextual baselines.

CoderPat commented 2 years ago

This is a very valid point. However that co-reference resolution would be more expensive since we would be running co-reference against the whole document vs just the current sentence. Also not sure how the performance of co-reference resolution degrades with longer contexts. But overall if models are just as good at doing coreference outside vs inside the sentence. Then I think this is a good change Also potentially checking if there is a more native Spacy co-reference system could make alot pains with tokenization go away

neubig commented 2 years ago

Coref is definitely harder outside the sentence, but it still might be good enough with recent models.

Here's a spacy-native coref toolkit, not sure of the quality: https://spacy.io/universe/project/neuralcoref

nightingal3 commented 2 years ago

Looks good, I can investigate adding it after the current refactor + tests are merged

CoderPat / MuDA

Keep track of reference chain in MuDA #11