Thank you for all of the work you have put into this library, it has helped me immensely! While I am inexperienced at submitting pull requests on major repos and blissfully ignorant of what goes into the compatibility testing and proper generation of documentation I did not want to let that deter me from submitting something.
For quote attribution the triples.py currently relies on constants.REPORTING_VERBS. The comment on line 201 of triples.py shows interest in implementing a model to perform this functionality.
Proposed solution
This solution would rely on an additional dependency spacy-wordnet which in turn relies on nltk.
If instead of string matching lemmas to reporting verbs it may be possible to access the sense tagging from the wordnet corpus and use this method instead.
Beyond the additional support required for the increased dependencies I have found the following solution (which requires two changes) to work for me.
When calling make_spacy_doc
# en_core_web_trf is not required here, works with en_core_web_sm
nlp = spacy.load('en_core_web_trf')
The following could possibly be implemented with some sort of config option in core.py
Context
Thank you for all of the work you have put into this library, it has helped me immensely! While I am inexperienced at submitting pull requests on major repos and blissfully ignorant of what goes into the compatibility testing and proper generation of documentation I did not want to let that deter me from submitting something.
For quote attribution the triples.py currently relies on constants.REPORTING_VERBS. The comment on line 201 of triples.py shows interest in implementing a model to perform this functionality.
Proposed solution
This solution would rely on an additional dependency spacy-wordnet which in turn relies on nltk. If instead of string matching lemmas to reporting verbs it may be possible to access the sense tagging from the wordnet corpus and use this method instead.
Beyond the additional support required for the increased dependencies I have found the following solution (which requires two changes) to work for me.
The following could possibly be implemented with some sort of config option in core.py
or added by the user in their own function
nlp.add_pipe("spacy_wordnet", after='tagger') doc = textacy.make_spacy_doc(text, lang=nlp)
tok.pos == VERB and tok.lemma_ in _reporting_verbs
tok.pos == VERB and tok..wordnet.lemmas() and tok..wordnet.lemmas()[0]._synset._lexname == 'verb.communication'