Closed paradise1260 closed 2 years ago
Question: What features most affect a wine's quality rating? We can use Logistic Regression and map out the coefficients that most affect the wine. Or simply do an ML classification model, using any of the more advanced models we have learned.
Over 6000 examples, 12 features target is multiclass :
Question: Predict which stocks will have the highest rate of return the following week based on features and performance from the week prior. Alternatively we can also discuss which indicators are most likely to affect a positive performance in the following week. This is a regression problem not a classification problem.
750 examples, 16 features.
Online shoppers purchasing intention
Question: Predict whether an online shopper will end up shopping or not.
The dataset has 12330 examples and 17 features. The task is classification.
Question: Predict whether a client will default their credit card or not.
The dataset has 30000 examples and 23 features. The task is classification.
I have found two suitable datasets. Reasons: Both of the dataset are relatively clean and easy to work with. The description pages linked above are worth reading (very clear and straight to the point). Data.info() provides insights of the raw dataset.
Predict Student performance (Regression problem) https://archive-beta.ics.uci.edu/ml/datasets/student+performance data.info() RangeIndex: 649 entries, 0 to 648 Data columns (total 33 columns): Column Non-Null Count Dtype 0 school 649 non-null object 1 sex 649 non-null object 2 age 649 non-null int64 3 address 649 non-null object 4 famsize 649 non-null object 5 Pstatus 649 non-null object 6 Medu 649 non-null int64 7 Fedu 649 non-null int64 8 Mjob 649 non-null object 9 Fjob 649 non-null object 10 reason 649 non-null object 11 guardian 649 non-null object 12 traveltime 649 non-null int64 13 studytime 649 non-null int64 14 failures 649 non-null int64 15 schoolsup 649 non-null object 16 famsup 649 non-null object 17 paid 649 non-null object 18 activities 649 non-null object 19 nursery 649 non-null object 20 higher 649 non-null object 21 internet 649 non-null object 22 romantic 649 non-null object 23 famrel 649 non-null int64 24 freetime 649 non-null int64 25 goout 649 non-null int64 26 Dalc 649 non-null int64 27 Walc 649 non-null int64 28 health 649 non-null int64 29 absences 649 non-null int64 30 G1 649 non-null int64 31 G2 649 non-null int64 32 G3 649 non-null int64 dtypes: int64(16), object(17) memory usage: 167.4+ KB
Classify origin of wines (Classification problem) https://archive-beta.ics.uci.edu/ml/datasets/wine Use 13 features to classify one of three origins; 178 entries 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10)Color intensity 11)Hue 12)OD280/OD315 of diluted wines 13)Proline
Notes: I also think the wine quality dataset is interesting (Regression problem). https://archive-beta.ics.uci.edu/ml/datasets/wine+quality
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
Column Non-Null Count Dtype
0 fixed acidity 1599 non-null float64
1 volatile acidity 1599 non-null float64
2 citric acid 1599 non-null float64
3 residual sugar 1599 non-null float64
4 chlorides 1599 non-null float64
5 free sulfur dioxide 1599 non-null float64
6 total sulfur dioxide 1599 non-null float64
7 density 1599 non-null float64
8 pH 1599 non-null float64
9 sulphates 1599 non-null float64
10 alcohol 1599 non-null float64
11 quality 1599 non-null int64
dtypes: float64(11), int64(1)
memory usage: 150.0 KB
I was thinking including both two datasets to form some research idea, since they are very close, just in case we are short of work to do. We could also just take one. :)
Here are my top 4:
My top 4:
My top 4:
All the datasets we chose seem very interesting, but I am thinking that a larger data set might be better, so here comes my ranking.
My top 4:
Looks like we have a winner! I Love me some wine... 👍
My follow up for the formal question: Predict a wine's quality based on a set of chemical composition features
Possible stretch questions:
Add possible repo rename: Wine Quality Predictor
Hello team,
Please provide 2 research questions and datasets under this issue.