Open svenvanderburg opened 6 months ago
I propose the dataset 'house_prices.csv' that is already in the repo: https://github.com/esciencecenter-digital-skills/scikit-learn-mooc/blob/main/datasets/house_prices.csv
See its description here: https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.html
The house_prices.csv
data set looks nice, but it does have a similar drawback that it only contains numerical features.
Nevermind, house_prices.csv
is actually the Ames Houses data set with both numerical and categorical features. I think the link to the description above should be this one, right?
I also agree that the house_prices.csv
looks good.
I see all the columns (including categorical and numerical features) in the file linked above.
Sorry for the confusing, indeed it is the Ames houses data set, and I included the wrong link. You are right @maltelueken!
I will proceed with the Ames houses dataset then.
The used penguins dataset only has 2 numerical features and noccategorical features.
We can consider using a more difficult dataset or use the dataset with more features (including 'sex', 'island').