Open Tommicus opened 7 years ago
Don't you want to try classification trees? We found C5.0 package extremely good for predicting correct class from the first run. Our files are here: https://github.com/egor-gazarov/PredictingEmployeesLeave-INSEAD17J-GP
Any advice on the Type 1 error please?
https://github.com/Tommicus/GOLDHR/blob/master/GOLDHR.Rmd
In the classification analysis
We used the code from the course website, adjusting for our data set and our problem
The CART1, CART2 and Logistic Regr. return a -1.00 in the variable importance for the first independent variable (Satisfaction level in our base case) in the set (we tried multiple independent variables)
The confusion matrix (validation) returns a huge Type 1 error (99.58%) (people staying although we predicted them to leave) but a relatively small Type 2 error (8.43%). If we increase the probability threshold, the Type 1 error doesn’t decrease which is counter intuitive.
For the Test Accuracy confusion matrix the result is pretty much the same.
Can this be because some of the coefficients for the logistic regression are not significant? If we exclude them the results are pretty much the same
The logistic regression produces a lot of mistakes
Any help much appreciated