An unseen value is a value that was passed in during the transform step, but not present in the fit step input. Previously, RobustOrdinalEncoder mapped unseen values to the integer n where n is the number of categories. This behavior is changed here to map unseen values to np.nan. The new behavior produce better prediction quality since XGBoost can handle np.nan values. The old behavior is still available by setting unknown_as_nan to False.
For a given column of n unique values, seen values will be mapped to integers 0 to
n-1 and unseen values will be mapped to integer n.
Merge Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.
An unseen value is a value that was passed in during the transform step, but not present in the fit step input. Previously, RobustOrdinalEncoder mapped unseen values to the integer n where n is the number of categories. This behavior is changed here to map unseen values to np.nan. The new behavior produce better prediction quality since XGBoost can handle np.nan values. The old behavior is still available by setting unknown_as_nan to False.
For a given column of n unique values, seen values will be mapped to integers 0 to n-1 and unseen values will be mapped to integer n.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.