Closed david-cortes closed 9 months ago
Moved the check to check_X
instead as suggested above.
Regarding errors from pandas
, it does throw errors sometimes when there are duplicates, but only under some particular situations:
https://pandas.pydata.org/pandas-docs/stable/user_guide/duplicates.html
Also changed the mechanism towards the attribute is_unique
as it seems that's what they recommend in their guide.
Moved the tests to test_dataframe_checks.py
.
Added a check on the error message.
Hey @david-cortes
I made a PR to your repo: https://github.com/david-cortes/feature_engine/pull/4
Where I rebase main and add this contribution to the changelog.
Would you have time to merge over there, so it updates here and I can merge and close?
Thanks a lot!
Hey @david-cortes
I made a PR to your repo: david-cortes#4
Where I rebase main and add this contribution to the changelog.
Would you have time to merge over there, so it updates here and I can merge and close?
Thanks a lot!
Thanks, although I think you should also be able to push changes to the branch directly.
Merging #686 (a53a7bd) into main (3343305) will increase coverage by
0.00%
. Report is 1 commits behind head on main. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #686 +/- ##
=======================================
Coverage 97.99% 97.99%
=======================================
Files 100 100
Lines 3843 3849 +6
Branches 754 752 -2
=======================================
+ Hits 3766 3772 +6
Misses 28 28
Partials 49 49
Files Changed | Coverage Δ | |
---|---|---|
feature_engine/creation/math_features.py | 97.77% <ø> (ø) |
|
feature_engine/dataframe_checks.py | 97.05% <100.00%> (+0.08%) |
:arrow_up: |
feature_engine/datetime/datetime.py | 100.00% <100.00%> (ø) |
|
feature_engine/datetime/datetime_subtraction.py | 94.73% <100.00%> (+0.07%) |
:arrow_up: |
feature_engine/encoding/base_encoder.py | 100.00% <100.00%> (ø) |
|
feature_engine/encoding/one_hot.py | 100.00% <100.00%> (ø) |
|
feature_engine/encoding/rare_label.py | 100.00% <100.00%> (ø) |
|
feature_engine/imputation/categorical.py | 95.31% <100.00%> (ø) |
|
feature_engine/selection/shuffle_features.py | 100.00% <100.00%> (ø) |
|
feature_engine/transformation/yeojohnson.py | 100.00% <100.00%> (ø) |
|
... and 1 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
This PR adds an informative error message in cases in which the user supplies inputs having duplicated column names, which otherwise manifest in hard-to-track errors (e.g. https://github.com/feature-engine/feature_engine/pull/681).