Repository for the Linear and Nonlinear Models course at the University of Chicago. This project develops a sentiment analysis tool using the Sentiment140 dataset, involving data preprocessing, EDA, model development, Explainable AI (XAI) methods, and causal inference techniques.
Perform comprehensive exploratory data analysis (EDA) on the Sentiment140 dataset to gain insights into the data and identify patterns. The following analyses should be included:
Class Distribution:
Create a bar chart to display the distribution of positive and negative sentiment labels.
Understand the balance of the dataset.
Text Length Analysis:
Create a histogram to show the distribution of tweet lengths (number of characters or words).
Understand the typical length of the tweets.
Word Cloud:
Visualize the most common words in the tweets using a word cloud.
Get a sense of frequently occurring terms.
Sentiment Distribution by Length:
Create a box plot to compare the distribution of tweet lengths between positive and negative sentiments.
Common Words by Sentiment:
Create bar charts to display the top N most common words for both positive and negative sentiments.
Identify distinct language patterns.
N-grams Analysis:
Create bar charts to show the most common bi-grams and tri-grams.
Understand common phrases in the tweets.
Sentiment Over Time:
Plot the sentiment distribution over time using a line chart.
Identify trends or shifts in sentiment.
User Analysis:
Create a bar chart to show the distribution of the number of tweets per user.
Identify any prolific tweeters in the dataset.
Hashtag Analysis:
-Create a bar chart to display the most common hashtags used in the tweets.
-Identify popular topics or trends.
Sentiment by User:
Create a box plot to analyze the distribution of sentiments per user.
Description:
Perform comprehensive exploratory data analysis (EDA) on the Sentiment140 dataset to gain insights into the data and identify patterns. The following analyses should be included:
Class Distribution:
Text Length Analysis:
Word Cloud:
Sentiment Distribution by Length:
Common Words by Sentiment:
N-grams Analysis:
Sentiment Over Time:
User Analysis:
Hashtag Analysis:
-Create a bar chart to display the most common hashtags used in the tweets. -Identify popular topics or trends.
Sentiment by User: