Open pan1fan2 opened 3 years ago
Regarding 1. , we could try multiple models and pick the best performing one. So we could try all the classifications model we learned so far. Regarding 2, including the wine type as a new column will be a good idea since different type of wine have different chemical composition.
Btw just out of curiosity what is the class distribution ? I'm assuming good quality wine have way less observations than other.
The distribution looks like this ~
Thanks Pan
Hi All,
I defined a "bad wine" if it had a quality score less than 5, 'good wine' if greater than 5 but less than 7, and the rest would be considered as "Excellent"
The red_wine and white_wine datasets are combined into full_data.
After a quick EDA analysis, we are going to have an Imbalance class problem, here comes my questions: