UBC-MDS / DSCI522_group17

Group work for DSCI522 Our group number is 17
MIT License
1 stars 5 forks source link

Imbalance class #6

Open pan1fan2 opened 4 years ago

pan1fan2 commented 4 years ago

Hi All,

I defined a "bad wine" if it had a quality score less than 5, 'good wine' if greater than 5 but less than 7, and the rest would be considered as "Excellent"

The red_wine and white_wine datasets are combined into full_data.

After a quick EDA analysis, we are going to have an Imbalance class problem, here comes my questions:

  1. what model? Decision Tree, Random Forest..?
  2. Do we care about the wine type when we set up the ML model?
jachang0628 commented 4 years ago

Regarding 1. , we could try multiple models and pick the best performing one. So we could try all the classifications model we learned so far. Regarding 2, including the wine type as a new column will be a good idea since different type of wine have different chemical composition.

jachang0628 commented 4 years ago

Btw just out of curiosity what is the class distribution ? I'm assuming good quality wine have way less observations than other.

pan1fan2 commented 4 years ago

image The distribution looks like this ~

jachang0628 commented 4 years ago

Thanks Pan