BenChehade / datasciences

attempt at data science competitions - mostly kaggle
MIT License
1 stars 0 forks source link

Mathematics #2

Open shutUpAndCode opened 7 years ago

shutUpAndCode commented 7 years ago

So the data we have is as follows: 23 categorical variables, 24 ordinal variables, 19 continuous variables, 13 discrete (numerical) variables

79 in total

and 1 dependent variable (house price)

shutUpAndCode commented 7 years ago

I'm fairly happy with everything but the ordinal variables, I found a good paper about it but has anyone built regression models with ordinal variables before?

What I'm thinking as a starter for 10 (ignoring the ordinal issue for the moment), Binarize the categorical variables, see if any of our variables are highly correlated and remove them, then perform some kind of feature selection regression (lasso and ridge regression seem like a good start).

DataMonsterBoy commented 7 years ago

For now can't we just order them equally spaced? for example good, neutral and bad would become 1,2 and 3. Ordering them in a clever way would be another modelling problem which could be useful to solve but probably best not doing for now. What are your thoughts?