TimSchopf / KeyphraseVectorizers

Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a document-keyphrase matrix.
https://arxiv.org/abs/2210.05245
BSD 3-Clause "New" or "Revised" License
251 stars 34 forks source link

It does not exclude stop words in Portuguese #31

Closed phuclh closed 4 months ago

phuclh commented 1 year ago

I am testing a document in Portuguese, but it doesn't exclude the stop words from the result even I already defined stop_words='portuguese'.

CleanShot 2023-06-19 at 22 50 08@2x

This is the document that I tested with

portuguese_doc.txt

TimSchopf commented 4 months ago

Solved with the v0.0.12 release.