abu-rayyan / crawper

A python based crawler and scraper
0 stars 0 forks source link

Consider sensibility of common phrases #8

Open ghost opened 6 years ago

ghost commented 6 years ago
 @staticmethod
    def find_common_phrases_in_reviews(reviews):
        """
        Finds and returns most common phrases in reviews of a product
        :param reviews: list of reviews
        :return: list of most common phrases
        """
        logger.debug('finding common phrases in reviews {reviews}'.format(reviews=reviews))
        combined_review_text = ''
        most_common_phrases = []
        for review in reviews:
            combined_review_text += ' ' + review[0].decode('utf-8')

        trigrams = ngrams(combined_review_text.split(), 3)
        freq = nltk.FreqDist(trigrams)
        decision_freq = math.ceil(freq.most_common(1)[0][1] * 0.2)  # round off to nearest integer
        most_common = freq.most_common()
        for key, val in most_common:
            if val > decision_freq:
                most_common_phrases.append(key)

        return most_common_phrases
ghost commented 6 years ago

phrase sentiment score should be greater then 0.6