Closed LetianFeng closed 3 years ago
Didn't see that this was already reported. I was adding type hints and caught this bug myself too but later than you did. In v1.0.5 this is put up as a feature providing the user the control whether they want to use unique phrases in phrase list or non unique ones using a flag as indicated here
This implementation ignores phases with multiple occurence, for example:
text = 'Red apples, are good in flavour. Where are my red apples? Apples!'
According to the paper, we should get a list of phrases and their weights like:
['red apples', 'good', 'flavor', 'red apples', 'apples']
So the correct ranked phrases should be:
However, in the current implementation, the extracted phrase list is:
['red apples', 'good', 'flavor', 'apples']
Obviously, the second 'red apples' is ignored, so the ranked phrases have wrong scores:
This bug could be fixed very easily, simply change the function
extract_keywords_from_sentences
and_generate_phrases
as shown below: