Open chencjiajy opened 10 months ago
Hi @chencjiajy, great question.
The library considers noun chunks and apparently spaCy
parses the term it
as that.
The coreference capabilities for spaCy
are currently marked "experimental", which is a nice way to say "Good luck installing and running this part in production" :) I've evaluated multiple options for coreference (including the AllenNLP
integration) and they each seem to have serious limitations. That said, if these capabilities were available, it would be relatively simple to resolve a pronoun reference within the graph. In that case, the term it
would add more weight to The MCU SDK
instead.
If you want, the term it
might be good to add to the stop words list for your application?
Hi, @ceteri , I found it's not useful to add item it
to the stop words list, and the same as other single PRON
words. Because pos_kept
don't include the PRON, I don't need to add a single PRON
word to stop words. In the code of function _collect_phrases
atbase.py
, pytextrank will exclude single PRON
word that not be included in the pos_kept
. So for single PRON
word, it's rank will always be 0.0, So what I need to do is to filter the phrase it's rank is equal to zero.
phrases: typing.Dict[Span, float] = {
span: sum(
ranks[Lemma(token.lemma_, token.pos_)]
for token in span
if self._keep_token(token)
)
for span in spans
}
I have run the following code snippet, the output including word "it",
pos_kept
don't include the PRON.