MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.54k stars 349 forks source link

output arrangement #138

Open iaditij opened 2 years ago

iaditij commented 2 years ago

I am getting results arranged according to the importance

def keyword_exctraction(self,new_text):
        eng_stopwords = stopwords.words('english')
        hinglish_stopwords=pd.read_csv("stopwords_hinglish.csv")
        hinglish_stop_words=hinglish_stopwords['Stop_words'].tolist()
        stop=hinglish_stop_words+eng_stopwords
        multilingual = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')
        kw_model = KeyBERT(model=multilingual)
        doc = new_text
        keyword = kw_model.extract_keywords(doc, stop_words=stop,top_n=10,
                                  use_mmr=False,diversity=0.2,highlight=False)
        multilingual = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')
        kw_model = KeyBERT(model=multilingual)
        #print(keyword)
        return keyword

keyword_extracted=self.keyword_exctraction(text)

text = "i want Iphone 14 purple"

The results i am getting is : [('iphone', 0.5916), ('purple', 0.5219), ('14', 0.272)]

But the language order of the result should be "iphone 14 purple" How can we get results in order via keybert?

MaartenGr commented 2 years ago

You would have to change the output yourself in order to sort them according to appearance in the text. As a default, KeyBERT returns the keywords ordered by importance. You could, for example, use the internal CountVectorizer to split up the document into words and re-order the keywords by their appearance in the text.