Open nightingal3 opened 2 years ago
This is a very valid point. However that co-reference resolution would be more expensive since we would be running co-reference against the whole document vs just the current sentence. Also not sure how the performance of co-reference resolution degrades with longer contexts. But overall if models are just as good at doing coreference outside vs inside the sentence. Then I think this is a good change Also potentially checking if there is a more native Spacy co-reference system could make alot pains with tokenization go away
Coref is definitely harder outside the sentence, but it still might be good enough with recent models.
Here's a spacy-native coref toolkit, not sure of the quality: https://spacy.io/universe/project/neuralcoref
Looks good, I can investigate adding it after the current refactor + tests are merged
Currently, we track whether a sentence has any links to the previous sentence with booleans. If we could keep track of these references explicitly, we could automatically change them to create contrastive datasets for certain phenomena to measure a model's context sensitivity. This could be difficult to do, but would remove the dependence on non-contextual baselines.