feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.87k stars 310 forks source link
data-science feature-engineering feature-extraction feature-selection machine-learning python scikit-learn

Feature-engine

feature-engine logo

Package PyPI - Python Version PyPI Conda Monthly Downloads Downloads
Meta GitHub GitHub contributors Gitter first-timers-only Sponsorship
Documentation Read the Docs DOI JOSS
Testing CircleCI Codecov Code style: black

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.

Feature-engine features in the following resources

Blogs about Feature-engine

Documentation

Pst! How did you find us?

We want to share Feature-engine with more people. It'd help us loads if you tell us how you discovered us.

Then we'd know what we are doing right and which channels to use to share the love.

Please share your story by answering 1 quick question at this link . 😃

Current Feature-engine's transformers include functionality for:

Imputation Methods

Encoding Methods

Discretisation methods

Outlier Handling methods

Variable Transformation methods

Variable Creation:

Feature Selection:

Datetime

Time Series

Pipelines

Preprocessing

Wrappers:

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Contribute

Details about how to contribute can be found in the Contribute Page

Briefly:

Thank you!!

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed: from the root directory:

pip install -r docs/requirements.txt

Now you can build the docs using:

sphinx-build -b html docs build

License

The content of this repository is licensed under a BSD 3-Clause license.

Sponsor us

Sponsor us and support further our mission to democratize machine learning and programming tools through open-source software.