chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 250 forks source link

Anaphora resolution? #116

Open chrisspen opened 7 years ago

chrisspen commented 7 years ago

This is more of a general question or feature request.

The textacy.extract.subject_verb_object_triples() is really interesting and useful, but I notice for a lot of texts, it ends up returning triples with pronouns in the subject or object. For most NLP tasks, these anaphora need to be resolved to one of the discrete nouns seen earlier. Is there anything in textacy to accomplish this?

A naive approach would be to iterate over the results and track the last non-ananphora entity, and replace all subsequent anaphora with that entity. This will mis cases where the anaphora refers to the object or verb, but it's better than nothing.

bdewilde commented 7 years ago

Hey @chrisspen , thanks for the feature request. I feel your pain... I've actually tried the "naive approach" you mentioned, but found its results too poor to include in textacy. And doing anaphora resolution well is sufficiently hard that I never got around to tackling it.

So, I'll add this back into my backlog. It would be a very useful thing to have! If you have any ideas / resources, don't hesitate to post here.

bdewilde commented 7 years ago

Good news: relevant code built on spacy was recently open-sourced. It's on my to-read list... https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30

cpetroaca commented 6 years ago

I'm also interested in this, Is there a plan to integrate it in the subject_verb_object_triples() functionality?