Montreal-Analytics / dbt-snowflake-utils

Snowflake-specific utility macros for dbt projects.
Apache License 2.0
107 stars 37 forks source link

Feat/sentiment analysis #34

Open tbittencourt opened 1 year ago

tbittencourt commented 1 year ago

This macro iterates through a piece of text to return the overall sentiment of that text.

First, the macro pre-processes the text removing unnecessary punctuation and stopwords to help increase the accuracy of the model. Subsequently, using the transformers library it applies a sentiment analysis pipeline based on a pre-trained model that will return either a score or a label for the text.

Recommendation is to use the following popular models:

  1. cardiffnlp/twitter-roberta-base-sentiment-latest: (https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) This model is trained on 124M tweets from January 2018 to December 2021, and is finetuned for sentiment analysis. It outputs a label - Neutral, Positive or Negative - and a score ranging from 0 to 1 - 0 being the most negative and 1, the most positive.

  2. nlptown/bert-base-multilingual-uncased-sentiment: (https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) This model is fine-tuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It outputs a label - 1 to 5 stars - and a score ranging from 0 to 1 - 0 being the most negative and 1, the most positive.

Macro returns a STRING data type. If 'score' is used as an output, then it will have to be cast to FLOAT data type.

Mayurjit commented 2 months ago

function name and handler should be quivalent