Open danilobellini opened 1 year ago
@bbengfort Will we need to upgrade to scikit v1.3.0. I worry about calling ._validate_steps()
The _validate_steps
method in a scikit-learn Pipeline
is a private method used to check whether the steps of the pipeline are defined correctly. In the pipeline, the steps should be structured such that all steps up to the final one should be transformers (i.e., they should have a fit
and transform
method), and the final step should be an estimator (i.e., it should have a fit
method).
Calling _validate_steps()
explicitly in your test cases will make sure that this validation is performed at the moment you define the pipeline, rather than later when you try to fit
or transform
data with the pipeline.
In @danilobellini 's code, adding _validate_steps()
after the Pipeline
or VisualPipeline
instantiation will cause the validation to happen immediately. This means that if there's a problem with the steps (e.g., a non-transformer object in an intermediate step, or a non-estimator object as the final step), a TypeError
will be raised immediately, rather than later on when you try to use the pipeline.
This could make the tests clearer and more direct, as he is specifically testing the validation of the pipeline steps, and it's useful to have that validation happen as explicitly and immediately as possible. However, I'm aware that _validate_steps
is a private method (indicated by the leading underscore), which means that it's not part of the public API of the Pipeline
class and could potentially change in future versions of scikit-learn. Using private methods can sometimes lead to less stable code, as they're not guaranteed to stay the same in the way that public methods are.
This is a required step as
Pipeline._validate_steps
method is not called by scikit-learn during construction, but it's expected by tests. I tried running it with scikit-learn v1.2.2 and scikit-learn v1.3.0. I found that in commit 0110921 for v1.1.0 that validation was removed upstream fromPipeline.__init__
.I'm currently the maintainer of the yellowbrick's AUR package and I'm having some trouble to update the package to v1.5.0 because of tests. The mentioned test is one of the failing tests, I brought it here in this pull request because it was simple to fix.
I've called
black
andpytest
for the resulting file.