In Data Frame Analytics job setup, when unchecking a field X (e.g. Cancelled in the ss below) and then filter for is not included, the expected behavior is to see field X in that list.
However that's not currently the case (Cancelled does not appear in the filtered list, see ss below):
This is confusing. It is important to be clear on which variables are excluded before creating the model because including ones that are tightly correlated to the dependent variable is "cheating" and including ones that are correlated between them undermines the model's performance.
I believe that that list only shows the fields that were excluded automatically by the system and not by the users when they uncheck. If that's the case, we should make that clear or, even better, fix the filter to apply to all included/excluded fields. cc @alvarezmelissa87
In Data Frame Analytics job setup, when unchecking a field X (e.g.
Cancelled
in the ss below) and then filter foris not included
, the expected behavior is to see field X in that list.However that's not currently the case (
Cancelled
does not appear in the filtered list, see ss below):This is confusing. It is important to be clear on which variables are excluded before creating the model because including ones that are tightly correlated to the dependent variable is "cheating" and including ones that are correlated between them undermines the model's performance.
I believe that that list only shows the fields that were excluded automatically by the system and not by the users when they uncheck. If that's the case, we should make that clear or, even better, fix the filter to apply to all included/excluded fields. cc @alvarezmelissa87