Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a document-keyphrase matrix.
Can you please include, at least in the documentation, the regex from the paper?
In this code, the "standard is to only select keyphrases that have 0 or more adjectives, followed by 1 or more nouns."
In the paper, the POS pattern is "arbitrary parts-of-speech separated by a hyphen, followed by zero or more nouns
OR zero or one verb (gerund or present or past participle), followed by zero or more adjectives, followed
by one or more nouns"
Can you please include, at least in the documentation, the regex from the paper?
In this code, the "standard is to only select keyphrases that have 0 or more adjectives, followed by 1 or more nouns."
In the paper, the POS pattern is "arbitrary parts-of-speech separated by a hyphen, followed by zero or more nouns OR zero or one verb (gerund or present or past participle), followed by zero or more adjectives, followed by one or more nouns"