Open chrisspen opened 7 years ago
Hey @chrisspen , thanks for the feature request. I feel your pain... I've actually tried the "naive approach" you mentioned, but found its results too poor to include in textacy. And doing anaphora resolution well is sufficiently hard that I never got around to tackling it.
So, I'll add this back into my backlog. It would be a very useful thing to have! If you have any ideas / resources, don't hesitate to post here.
Good news: relevant code built on spacy was recently open-sourced. It's on my to-read list... https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
I'm also interested in this, Is there a plan to integrate it in the subject_verb_object_triples() functionality?
This is more of a general question or feature request.
The
textacy.extract.subject_verb_object_triples()
is really interesting and useful, but I notice for a lot of texts, it ends up returning triples with pronouns in the subject or object. For most NLP tasks, these anaphora need to be resolved to one of the discrete nouns seen earlier. Is there anything in textacy to accomplish this?A naive approach would be to iterate over the results and track the last non-ananphora entity, and replace all subsequent anaphora with that entity. This will mis cases where the anaphora refers to the object or verb, but it's better than nothing.