feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.88k stars 310 forks source link

Add the inverse_transform method in the yeoJohnson transformer #679

Closed GiorgioSgl closed 1 year ago

GiorgioSgl commented 1 year ago

Add the inverse_transform function in the yeoJohnson transformer.

In sklearn PowerTransformer documentation is explained in the following way:

The inverse of the Box-Cox transformation is given by:
    if lambda_ == 0:
    X = exp(X_trans)
    else:
    X = (X_trans * lambda_ + 1) ** (1 / lambda_)

    The inverse of the Yeo-Johnson transformation is given by:

    if X >= 0 and lambda_ == 0:
    X = exp(X_trans) - 1
    elif X >= 0 and lambda_ != 0:
    X = (X_trans * lambda_ + 1) ** (1 / lambda_) - 1
    elif X < 0 and lambda_ != 2:
    X = 1 - (-(2 - lambda_) * X_trans + 1) ** (1 / (2 - lambda_))
    elif X < 0 and lambda_ == 2:
    X = 1 - exp(-X_trans)
GiorgioSgl commented 1 year ago

Idk why it's failing some tests in the rare label encoder. In the previous commit 3bfc6d4047b44cb84bce0ef42e5e5dad9f450695 it was passed, and now I only modify style (directly with black tool)

solegalli commented 1 year ago

Hey @glevv @ClaudioSalvatoreArcidiacono @dlaprins, would any of you have time to have a look at these 2 tests and see why they fail?

=========================== short test summary info ============================
FAILED tests/test_encoding/test_rare_label_encoder.py::test_correctly_ignores_nan_in_fit - AssertionError: assert {'var_A': ['B...C', 'B', 'A']} == {'var_A': ['B...D'...
FAILED tests/test_encoding/test_rare_label_encoder.py::test_correctly_ignores_nan_in_fit_when_var_is_numerical - AssertionError: assert {'var_A': ['B....0, 2.0, 1.0]} == {'var_A': ['B... [...
========== 2 failed, 1434 passed, 2924 warnings in 176.15s (0:02:56) ===========
py39: exit 1 (177.96 seconds) /home/circleci/project> pytest tests pid=498
  py39: FAIL code 1 (202.37=setup[24.41]+cmd[177.96] seconds)
  evaluation failed :( (202.44 seconds)

I think it my be related to some randomization, because it fails in only the latest Python versions.

If you have time, I'd greatly appreciate if you could fixed them in a different PR :)

GiorgioSgl commented 1 year ago

This weekend I will check all the comments

GiorgioSgl commented 1 year ago

Just fixed the little problems, tell if have to fix something more.

This weekend I will check the problems with the tests of the different lambda.

GiorgioSgl commented 1 year ago

Just fixed all the review you give me. Let me know if there is something more!

solegalli commented 1 year ago

Hey @GiorgioSgl

I made a PR to your repo to sort the failing tests: https://github.com/GiorgioSgl/feature_engine/pull/1

Would you be able to merge so it updates here?