Closed Phylanxy closed 10 months ago
Ok, my bad I just found my way to the documentation - somehow I wasn't able to find it before.
Here is the answer to my question - maybe someone else will find it useful as well: https://maartengr.github.io/KeyBERT/guides/countvectorizer.html
I am trying to get some keywords from google maps restaurant reviews that would allow me to give the restaurants some meaningful labels. There are some words I don't want to include because they are meaningless in my context. Those would be words like "restaurant" or "cuisine" - at least as single words. My problem now is that if I remove these words, that might lead to wrong parings when using n_grams of 2 or more. I also tried removing them afterwards but this would either lead to a lot of phrases being removed if I remove all phrases containing the unwanted words OR would result in false keywords if I only remove the words from the phrases.
Is there a way to do this more elegantly?
Subquestion: I tried using my own list of stopwords (which worked great with BERTopic) but only got an empty list as output. Is there a way to do this in keyBERT?