We replace the notebook currently named "Beyond linear separation in classification" by a new notebook named "Non-linear feature engineering for Logistic Regression"
In this notebook we reuse the same 2D synthetic moons and Gaussian quantiles datasets
We start with a logistic regression and shows that it underfits
Then we build more and more complex pipelines with different preprocessors:
KBinsDiscretizer
SplineTransformer
We observe that those transformers do axis-aligned non linear transformations that lead to axis aligned classification decision boundaries,
We explore modeling multiplicative interactions between the derived features with
KBinsDirectizer with sparse output followed by PolynomialFeatures(degree=2, interaction_only=True)
SplineTransformer followed by Nystroem (either with kernel="rbf" and a good value of gamma or kernel="poly" and degree=2)
Then we add a new exercise with:
The half moons dataset only
SVC(kernel="linear") (this should give similar underfitting results as logistic regression from the previous notebook
As a follow up for #701, I suggest that:
We replace the notebook currently named "Beyond linear separation in classification" by a new notebook named "Non-linear feature engineering for Logistic Regression"
In this notebook we reuse the same 2D synthetic moons and Gaussian quantiles datasets
We start with a logistic regression and shows that it underfits
Then we build more and more complex pipelines with different preprocessors:
KBinsDiscretizer
SplineTransformer
We observe that those transformers do axis-aligned non linear transformations that lead to axis aligned classification decision boundaries,
We explore modeling multiplicative interactions between the derived features with
KBinsDirectizer
with sparse output followed byPolynomialFeatures(degree=2, interaction_only=True)
SplineTransformer
followed byNystroem
(either with kernel="rbf" and a good value ofgamma
orkernel="poly"
anddegree=2
)Then we add a new exercise with:
SVC(kernel="linear")
(this should give similar underfitting results as logistic regression from the previous notebookmake_pipeline(Nystroem(kernel="rbf", gamma=some_gamma, n_components=300), SVC(kernel="linear")
Then we can optionally suggest to try
MLPClassifier
on this dataset to get somewhat similar results.