An app that assesses the mood of the country based on digital data.
Please visit our app at https://digitalaffect.herokuapp.com/
DigitalAffect is at the confluence of sentiment analysis, social media proliferation, and machine learning.
We felt machine learning would be the most efficient way to analyse public opinion en masse by automating the sentiment analysis process. We arrived at this conclusion by researching popular tools in this field. The majority of the researchers chose the Python-based NLTK platform for its relatively extensive corpus collection and built-in features for tailoring machine learning.
We chose Chart.js for its sleek display interface and its comprehensive functionality.
We chose Twitter for its convenient character limitation which was ideal for working with our classifier to demonstrate public opinion.
As a User,
So that the UK's mood can be assessed,
I want to be able to enter a sentence on the page.
As a User,
So that the UK's mood can be classified as positive or negative,
I want to see the word 'positive' or 'negative'.
As a User,
So that I can get an idea about what people in the UK are thinking about a topic,
I want to be able to see the percentage of people who are positive or negative about it.
As a User,
So that I have confidence I am seeing the UK public's unbiased opinion,
I want the data to come from UK-based Twitter Users via their tweets.
As a User,
So that it's easy to use the site,
I want it to have a simple interface.
As a User,
So that I can enjoy using the site,
I want the site to look nice.
As a User,
So that I can have a visual representation of the output,
I would like to see the percentages of positive and negative views represented as a pie chart.
As a User,
So I can see people's opinions,
I want to be able to see the tweets from my search results.
As a User,
So I can make another search,
I want to be able to click back to the homepage.
We planned out how hopes for our first 3 MVPs. The first was a simple text input with a positive or negative string output displayed. We reached this on the second day:
The version 1 was a pie chart generated from the data given that showed multiple moods. In reality we had to scale this back to just positive and negative moods due to the challenge presented by the refinement of our classifier.
Version 2 was the shiny version of our efforts with a scrolling tweet bar so you can see UK Twitter's opinions for yourself. We coupled this with added classifier refinement, though we recognise in 2 weeks it's unlikely to be perfect.
We looked at various classifiers available to use with the NLTK Python platform for human language processing. After testing each one for accuracy we chose Naive Bayes as it gave the most consistently high accuracy ratings when tested:
We began refining our classifier to increase the overall accuracy category matching. This involved tailoring stopwords for our purposes, and filtering out extreme frequency values from our features. This works by removing the most common useless words from our classifiers vocabulary, and then removing features that will skew the overall accuracy of the classifiers matching process by preventing 0% or 100% feature values.
Before:
After: