ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document
Apache License 2.0
2.15k stars 221 forks source link

vietnamese stop words #101

Open cfreifeld opened 5 years ago

cfreifeld commented 5 years ago

This is a quick one: I'm using node-unfluff on Vietnamese language. There is currently no stop words file for Vietnamese. I took this one

https://github.com/stopwords/vietnamese-stopwords/blob/master/vietnamese-stopwords.txt

and dropped it into my data directory and it seems to be working. So you could add this do your distributed files. Cheers.