During my second year of university, I was tasked with completing a data analysis project on a dataset of my choosing. This included data cleaning, exploratory data analysis, data visualization, & presentation. I have since added a wide variety of skills to my toolkit as a data scientist - such as constructing advanced linear regression models, classification models, decision trees, & simple neural networks. So I have decided to utilize these tools to revisit this dataset to see if I can find insight I missed the first time.
The project is going to be built in python, & focus on Pandas/Seaborn for data frames & visualization, as well as StatsModels & SciKit-Learn for the ML models, as well as Numpy & MatPlotLib as pre req for the others
The project has 2 main outcomes: some conclusions in the form of thesis statements about trends in the data, & some visualizations to show my conclusion & the final itteration of my analysis
Such as my conclusion of the first section that queer peoples tax rates in relation to straight peoples is most strongly determiend by which city they live in:
Or my conclusion of the second that the magnitude of queer communities is also best run in relation to which city you're in:
Or my insights into the political alignment & Local of queer communities being different for Kinsey index & GAYBORHOODS TOTINDEX defined queer community centers & districts:
Or my final analysis of each cities geography - specifically the locations of queer communities (or GAYBORHOODS):
Sami Almuallim - samialmuallim@gmail.com
Project Link: https://github.com/almsam/data-analysis-project-revised