[x] convert "Less than 1 year" and "More than 50 years" to 0.5 and 51 for "YearsCode" and "YearsCodePro" columns
[x] remove countries that have below a certain threshold (500?)
[x] combine US-country and UK-state?
Michigan, USA -> United States of America:Michigan
Just USA, no state -> United States of America:
this may split US into too many different features, most without enough data. might not be able to do this. could potentially leave US states instead. or just use country.
[x] countries (one hot encoding? or too many countries?)
[x] simplify education to: No college, Associates, Bachelors, Masters, PhD and convert to 0,1,2,3,4 (manual, or via LabelEncoder from sklearn.preprocessing)
United States of America:Michigan
LabelEncoder
fromsklearn.preprocessing
)- [ ] handle DevType (simplify, one hot encode)