esciencecenter-digital-skills / scikit-learn-mooc

Lesson to teach machine learning in Python with scikit-learn in a 2-day workshop
https://esciencecenter-digital-skills.github.io/scikit-learn-mooc/
Creative Commons Attribution 4.0 International
0 stars 1 forks source link

Another dataset for final exercise? #35

Open svenvanderburg opened 6 months ago

svenvanderburg commented 6 months ago

The used penguins dataset only has 2 numerical features and noccategorical features.

We can consider using a more difficult dataset or use the dataset with more features (including 'sex', 'island').

svenvanderburg commented 5 months ago

I propose the dataset 'house_prices.csv' that is already in the repo: https://github.com/esciencecenter-digital-skills/scikit-learn-mooc/blob/main/datasets/house_prices.csv

See its description here: https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.html

maltelueken commented 5 months ago

The house_prices.csv data set looks nice, but it does have a similar drawback that it only contains numerical features.

maltelueken commented 5 months ago

Nevermind, house_prices.csv is actually the Ames Houses data set with both numerical and categorical features. I think the link to the description above should be this one, right?

carschno commented 5 months ago

I also agree that the house_prices.csv looks good. I see all the columns (including categorical and numerical features) in the file linked above.

svenvanderburg commented 5 months ago

Sorry for the confusing, indeed it is the Ames houses data set, and I included the wrong link. You are right @maltelueken!

svenvanderburg commented 5 months ago

I will proceed with the Ames houses dataset then.