Data Analysis on the GAYBORHOODS dataset

During my second year of university, I was tasked with completing a data analysis project on a dataset of my choosing. This included data cleaning, exploratory data analysis, data visualization, & presentation. I have since added a wide variety of skills to my toolkit as a data scientist - such as constructing advanced linear regression models, classification models, decision trees, & simple neural networks. So I have decided to utilize these tools to revisit this dataset to see if I can find insight I missed the first time.

About The Project

The project is going to be built in python, & focus on Pandas/Seaborn for data frames & visualization, as well as StatsModels & SciKit-Learn for the ML models, as well as Numpy & MatPlotLib as pre req for the others

Built With: Using , In addition to & for ML

Original Project by Sami Almuallim & Nat Scott, With Using &

(back to top)

At a glance

The project has 2 main outcomes: some conclusions in the form of thesis statements about trends in the data, & some visualizations to show my conclusion & the final itteration of my analysis

Such as my conclusion of the first section that queer peoples tax rates in relation to straight peoples is most strongly determiend by which city they live in:

Part 1: Tax Analysis

Or my conclusion of the second that the magnitude of queer communities is also best run in relation to which city you're in:

Part 2: Bars & Parades

Or my insights into the political alignment & Local of queer communities being different for Kinsey index & GAYBORHOODS TOTINDEX defined queer community centers & districts: