MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.31k stars 337 forks source link

Excluding certain words #179

Closed Phylanxy closed 10 months ago

Phylanxy commented 10 months ago

I am trying to get some keywords from google maps restaurant reviews that would allow me to give the restaurants some meaningful labels. There are some words I don't want to include because they are meaningless in my context. Those would be words like "restaurant" or "cuisine" - at least as single words. My problem now is that if I remove these words, that might lead to wrong parings when using n_grams of 2 or more. I also tried removing them afterwards but this would either lead to a lot of phrases being removed if I remove all phrases containing the unwanted words OR would result in false keywords if I only remove the words from the phrases.

Is there a way to do this more elegantly?

Subquestion: I tried using my own list of stopwords (which worked great with BERTopic) but only got an empty list as output. Is there a way to do this in keyBERT?

Phylanxy commented 10 months ago

Ok, my bad I just found my way to the documentation - somehow I wasn't able to find it before.

Here is the answer to my question - maybe someone else will find it useful as well: https://maartengr.github.io/KeyBERT/guides/countvectorizer.html