Move keyterm extraction functionality from a top-level keyterms module into a ke sub-package, and refactor+standardize its contents
all methods have similar args/options, and share code for selecting candidates, normalizing terms to strings, filtering to just the top-N key terms, and building term graphs
Add new unsupervised keyterm extraction algorithms
YAKE: statistical method, implemented in ke.yake()
sCAKE: graph-based method, implemented in ke.scake()
PositionRank: graph-based method, implemented in ke.textrank() with parameter values given in the docstring
Add new functionality for selecting candidate keyterms (in addition to n-grams method)
longest matching subsequence candidates: implemented in ke.utils.get_longest_subsequence_candidates()
pattern-matching candidates: implemented in ke.utils.get_pattern_matching_candidates()
Significantly improve speed of SGRank and generally optimize all of these algorithms
Motivation and Context
Still hunting for the "perfect" unsupervised keyterm extraction algorithm, although all of these methods have pros/cons. A lit review of recent results pointed me towards YAKE and sCAKE.
How Has This Been Tested?
Added lots of tests, and they all pass.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[x] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[x] My change requires a change to the documentation, and I have updated it accordingly.
Description
keyterms
module into ake
sub-package, and refactor+standardize its contentske.yake()
ke.scake()
ke.textrank()
with parameter values given in the docstringke.utils.get_longest_subsequence_candidates()
ke.utils.get_pattern_matching_candidates()
Motivation and Context
Still hunting for the "perfect" unsupervised keyterm extraction algorithm, although all of these methods have pros/cons. A lit review of recent results pointed me towards YAKE and sCAKE.
How Has This Been Tested?
Added lots of tests, and they all pass.
Types of changes
Checklist: