We would like to delve deeper into text analysis and web scraping.
We scrape data from Twitter, based on hashtag searches, and use different techniques to clean, analyze and present the data.
Example tweets to perform sentiment analysis on could be:
os
, Path
modules.Path
module.sklearn
Note: Not all plots work with all data. A few cases might result in bad output.
Starting the server
cd
into the modules
folderpython
to run the flask_service.py
Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
in the terminal (this might take a while (~40 seconds) since the machine learning model is trained once every time the server is started)Using the endpoint
The server exposes a single endpoint /api/sentiment
where you have to make all your requests.
Use Postman or a similar tool to test the server at http://localhost:5000/api/sentiment
- we have not deployed the server. There is no UI for the server so every request has to be made in a tool like Postman.
(Showing examples from Postman)
Preview
and copy everything after Example:
. Paste it into the body of your request{ "hashtags": [ "trump", "biden" ], "start_date": "2020-5-17", "end_date": "2020-5-22", "plot_type": "line", "remove_sentiment": "Uncertain", "tweet_count": 300, "fresh_search": true }
Explanation of search options Data gathering
"hashtags": [ "trump", "biden" ]
hashtags
Data filtering
"start_date": "2020-5-17"
"end_date": "2020-5-22"
"plot_type": "line"
bar
, line
and pie
line
plot (the other types may not work)Positive
tweets or Negative
tweets or the ones with a mixed sentiment (Uncertain
)
"remove_sentiment": "Uncertain"
"tweet_amount": 300
"fresh_search": true
"search_for": { "mentions": "@JoeBiden" }
"search_for": { "hashtags": "#trump" }
mentions
or hashtags
. The value should match the key so if the key is mentions
then the value must begin with @
mentions
option) since it in most cases filters away all the data resulting in an empty plot or no plot at all"get_stats": "hashtags"
"hashtags"
and "mentions"
@realDonaldTrump
has been mentioned ten times then you can do a new search with these options:
{ "hashtags": [ "trump", "biden" ], "start_date": "2020-5-17", "end_date": "2020-5-22", "plot_type": "line", "search_for": { "mentions": "@realDonaldTrump" }, "tweet_count": 300, "fresh_search": false }
to find the sentiment of those tweets. Overall Recommendation
"trump"
and "biden"
"Uncertain"
sentiment"line"
as plot typeJSON: { "hashtags": [ "trump", "biden" ], "start_date": "2020-5-12", "remove_sentiment": "Uncertain", "end_date": "2020-5-22", "plot_type": "line", "tweet_amount": 300 }
python app.py -h
to print the help output:
All the optional arguments have default values.
The program can run using all default values by simply passing the hashtags you want to gather info from.
Utilizing default values to search for the hashtags #trump
and #biden
:
python app.py trump biden
This would run the program using the following values:
{'certainty_high': 0.75,
'certainty_low': 0.25,
'date': [datetime.date(2020, 5, 22),
datetime.date(2020, 5, 27)],
'fresh_search': False,
'hashtags': ['trump', 'biden'],
'plot_type': 'pie',
'remove_sentiment': None,
'save_plot': False,
'search_hashtags': None,
'search_mentions': None,
'search_urls': None,
'tweet_count': 300}
Date by default is set to current day + 5 days
Changing plot type
and filtering on dates (hashtags omitted for brevity)
python app.py -p bar -d 2020-06-01 2020-06-02
or
python app.py --plot bar --date 2020-06-01 2020-06-02
Search for a specific amount of tweets (1000) and save the generated plots locally (hashtags omitted for brevity)
python app.py -s -c 1000
or
python app.py --save --count 1000