alteryx / autonormalize

python library for automated dataset normalization
https://blog.featurelabs.com/automatic-dataset-normalization-for-feature-engineering-in-python/
BSD 3-Clause "New" or "Revised" License
109 stars 16 forks source link
automatic automatic-normalization normalization

AutoNormalize

Tests

AutoNormalize is a Python library for automated datatable normalization. It allows you to build an EntitySet from a single denormalized table and generate features for machine learning using Featuretools.

Getting Started

Install

pip install featuretools[autonormalize]

Uninstall

pip uninstall autonormalize

Demos

API Reference

auto_entityset

auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)

Creates a normalized entityset from a dataframe.

Arguments:

Returns:

find_dependencies

find_dependencies(df, accuracy=0.98, index=None)

Finds dependencies within dataframe with the DFD search algorithm.

Returns:

normalize_dataframe

normalize_dataframe(df, dependencies)

Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:

  1. shortest lenghts
  2. has "id" in some form in the name of an attribute
  3. has attribute furthest to left in the table

Returns:


make_entityset

make_entityset(df, dependencies, name=None, time_index=None)

Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for normalize_dataframeand a new index will be created if any key has more than a single attribute.

Returns:


normalize_entityset

normalize_entityset(es, accuracy=0.98)

Returns a new normalized EntitySet from an EntitySet with a single entity.

Arguments:

Returns:


Built at Alteryx Innovation Labs

Alteryx Innovation Labs