Description:
In this project, you will practice:
Think of some questions you want to find a response to: the average rating for a given genre of movies on IMDB and their description, the average sentiment of a twitter username across time, the reviews for given categories of products are greater than other categories....whatever comes to mind. Make some questions before starting the process
Preferably, find data that is mainly text based. This can be: hotel reviews, tweets, lines per character in a TV show, dialogues from a book, whatsapp/telegram/tinder conversations, etc.
The data can be scrapped or obtained through an API, but it doesn't need to be. You can also look for csv files.
Clean and transform your data.
Load the data you obtain into SQL. Do this through Python if possible. Do you want to make it relational? Do you have things like: users, chats & messages? characters, episodes & seasons? Does it make sense to make it relational? Just one table? Define a DB diagram that makes sense. You may encounter some errors while inserting, as text can be tricky since it takes different formats. You will either have to: not do anything (data will be formatted well enough) or transform it using Python or outside of Python.
Now you have a clean database. Do queries and subqueries if you can.
Use another source of data so you can have more detailed analysis. You can just relate it semantically. This will help you generate Tableau visualizations.
Generate a Tableau dashboard with the insights you found.
Extract data
Transform it through Python
Add another source of info if needed
Load that data into SQL through Python & SQL
Try to answer as many questions as you can by running SQL queries through Python: aggregations, averages, comparisons, filtered data, etc
Export the result of those queries as csv files
On Tableau, create stories/dashboards.
Include your csv files from the queries & the original csv without it being queried too so you can have more visualizations
SQL
Python
Tableau