Topic brainstorming - Githubissues

alisongh commented 2 years ago

House price prediction
COVID-19 vaccines analysis
Stock market prediction

alisongh commented 2 years ago

Make sure that don't use the existing or published projects online

dingdingmammy commented 2 years ago

Project Idea: What gender is this name?

Inspiration: The Most Common Unisex Names In America: Is Yours One Of Them? https://fivethirtyeight.com/features/there-are-922-unisex-names-in-america-is-yours-one-of-them/

Dataset Sources: • https://www.back4app.com/database/back4app/list-of-names-dataset • https://data.world/datasets/names

Supervised Problem: Given a name, predict if belongs to male or female (or unisex) name • Multiclass classification

Unsupervised Problem: Name clustering to explore name clusters (from similar roots?/ more common names vs non common names?/ name of different origins?) • Clustering

If you remember we did a comic book exercise in our data viz class: https://fivethirtyeight.com/features/women-in-comic-books/ We can actually take these comic characters’ name as unseen data after the model is done.

dingdingmammy commented 2 years ago

Project Idea: Predict Rental Prices Umich library has dataset in dataplanet for download but requires manually clicking through all the states and counties and bedroom types, export as excel, and clean all the files. The data is an aggregated rental price for the year, sample csv file below:

https://dataplanet-sagepub-com.proxy.lib.umich.edu/dataset?view=AAsBXQAAgACBAQAAAAAAAAAA3_zMslwIJ8Ve1X%24GFlaG5ZTarfaw7yXcCmqggQNxI9I661sSuksZTZNq74hjG5X3dcm31CHH7Gf0CLf3ei7LdYh%243oAAn0Wc_rkQYuAy2r%24U87JoZH8FCDLIwwQzETzROu%242GKukxrs1h5z5BenMQgpA%24DVFYJLBepppU%24Tc0x4GHldVhlNbs6ZMeMj1UnxIQ%24xb4%248mSeAvIfMi1LkChnYLhd_PMLMr1RKYiJE_PzeGitDnCQA

dingdingmammy commented 2 years ago

Project Idea: Emotion Recognition through Tweets

Inspiration: Analysis of Emotion Data: A Dataset for Emotion Recognition Tasks https://towardsdatascience.com/analysis-of-the-emotion-data-a-dataset-for-emotion-recognition-tasks-6b8c9a5dfe57

Dataset: https://huggingface.co/datasets/emotion

Good that the data is already preprocessed and someone already did some EDA on the dataset, though the dataset is also avaliable on Kaggle, no one has done much with it yet.

dingdingmammy commented 2 years ago

Project Idea: India Bank Customer Segmentation

Source: https://www.kaggle.com/datasets/shivamb/bank-customer-segmentation

This has 1M+ transactions to play with. it's on Kaggle and only 2 people really did something on clustering but both don't have much in-depth interpretation from their results, they didn't do much feature engineering and it feels like the ML techniques are applied only for the sake of applying. With this extensive amount of data, we can actually do a lot of things!

Here are some suggested by the author:

Perform Clustering / Segmentation on the dataset and identify popular customer groups along with their definitions/rules
Perform Location-wise analysis to identify regional trends in India
Perform transaction-related analysis to identify interesting trends that can be used by a bank to improve/optimize their user experiences
Customer Recency, Frequency, Monetary analysis
Network analysis or Graph analysis of customer data.

We can also do:

Customer Churn Analysis
Customer Next Return Date or Next Transaction Amount Prediction

dingdingmammy commented 2 years ago

Project Idea: Paid Parking Demand Prediction

Dataset: https://data.seattle.gov/Transportation/Paid-Parking-Transaction-Data/gg89-k5p6

The City of Seattle has made paid parking transaction data set available for public use for research and entrepreneurial purposes under the City’s Open Data Program. This dataset is derived from parking pay stations placed on streets within city limits and the paid-by-phone parking transactions. The dataset is downloaded nightly with the prior days paid parking transaction data.

My Comment: There are about 192K records detailing each meter transaction. We can do some clustering to see what we can find out from the data, and we can do supervised learning to predict future meter usage for x number of periods. The data is fairly clean. It's not something a lot of people have already done something about.

dingdingmammy commented 2 years ago

The Seattle govt has this open data program and actually there are lots of interesting clean data we can explore!

https://data.seattle.gov/browse?sortBy=most_accessed&utf8=%E2%9C%93

alisongh commented 2 years ago

Project Idea: Predict Rental Prices Umich library has dataset in dataplanet for download but requires manually clicking through all the states and counties and bedroom types, export as excel, and clean all the files. The data is an aggregated rental price for the year, sample csv file below:

https://dataplanet-sagepub-com.proxy.lib.umich.edu/dataset?view=AAsBXQAAgACBAQAAAAAAAAAA3_zMslwIJ8Ve1X%24GFlaG5ZTarfaw7yXcCmqggQNxI9I661sSuksZTZNq74hjG5X3dcm31CHH7Gf0CLf3ei7LdYh%243oAAn0Wc_rkQYuAy2r%24U87JoZH8FCDLIwwQzETzROu%242GKukxrs1h5z5BenMQgpA%24DVFYJLBepppU%24Tc0x4GHldVhlNbs6ZMeMj1UnxIQ%24xb4%248mSeAvIfMi1LkChnYLhd_PMLMr1RKYiJE_PzeGitDnCQA

Similar to the house price prediction

alisongh / SIADS-696-Milestone-II

Topic brainstorming #1