Closed pangjac closed 4 months ago
Merging #703 (06bbf15) into main (a819eb1) will increase coverage by
0.00%
. Report is 3 commits behind head on main. The diff coverage is100.00%
.:exclamation: Current head 06bbf15 differs from pull request most recent head 44d3b16. Consider uploading reports for the commit 44d3b16 to get more accurate results
@@ Coverage Diff @@
## main #703 +/- ##
=======================================
Coverage 98.03% 98.04%
=======================================
Files 100 100
Lines 3877 3878 +1
Branches 761 761
=======================================
+ Hits 3801 3802 +1
Misses 28 28
Partials 48 48
Files | Coverage Δ | |
---|---|---|
...re_engine/selection/smart_correlation_selection.py | 97.67% <100.00%> (+0.02%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Thank you @pangjac !! That was by far the easiest solutions from all the ones we've contemplated.
Could you add it in this transformer as well?: https://github.com/feature-engine/feature_engine/blob/main/feature_engine/selection/drop_correlated_features.py
Would it be possible to add a test that fails with the original implementation but passes with your changes?
Hi @solegalli I don't see there is related issues on drop_correlated_features
, can you specify which kind of issue related?
Could you add it in this transformer as well?: https://github.com/feature-engine/feature_engine/blob/main/feature_engine/selection/drop_correlated_features.py
Also,
add a test that fails with the original implementation but passes with your changes?
I noticed that all of the tests in CircleCI test_feature_engine_py3x failed. I assume that these CircleCI tests are running on the main
branch? I added 4 sub-test methods to show that features_to_drop_
is not consistent in the main
branch.
If you switch to the branch bug/smartcorrselect-inconsist
and run pytest pytest -s tests/test_selection/test_smart_correlation_selection.py
, you will see that all of the newly added sub-tests pass. In this regard, what is the best way to add new pytest tests and include them in this pull request (if CircleCI automatically fails on the main
branch)?
Hey @pangjac
The tests are failing because seaborn is not a dependency of feature-engine, therefore it is not installed in the test environment on circleci.
Could we use a hand-crafted dataset for the test instead?
There is no issue associated with the [drop_correlated_features
class, but it has the same problem, because smart correlation is just an upgrade of this more simple transformer.
This PR has been over-seeded by #721
This PR aims to fix issues reported #570 and #684, where the
SmartCorrelatedSelection
attributefeatures_to_drop_
may not return the same result when run multiple times.The issue is caused by the fact that the
sort_values()
method does not guarantee a consistent order between columns if they share the same value.The fix is to ensure consistent ordering by sorting by the value first, and then by the feature name alphabetically for equal values.
Pytest has passed by
pytest tests/test_selection/test_smart_correlation_selection.py
. Other minor changes are automatically fixed byblack
orisort
.Please review.