alteryx / featuretools

An open source python library for automated feature engineering
https://www.featuretools.com
BSD 3-Clause "New" or "Revised" License
7.21k stars 873 forks source link

Manually remove specific features from feature definitions list #645

Closed jonimatix closed 5 years ago

jonimatix commented 5 years ago

Hello,

Is it possible to remove a few features from a feature definition list,. after saving a feature definition using: ft.save_features(features, "feature_definitions.json") ?

For example I need to remove 10 specific definitions correctly, without corrupting the json of the file structure?

Is it possible to do so at the moment using specific code or functionality?

Thanks a lot for your help, Jon

gsheni commented 5 years ago

Yes, you can do this.

Let's say you wanted to select features that had the amount in its name. You can check for this by using the get_name function on the feature definitions.

import featuretools as ft

es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(entityset=es,
                                      target_entity="customers")

features_with_amount = []
for x in feature_defs:
    if 'amount' in x.get_name():
        features_with_amount.append(x)
ft.save_features(features_with_amount, 'feature_definitions_amount.json')

Another thing you might want to do is to only select features that are aggregation features.

from featuretools import AggregationFeature

features_only_aggregations = []
for x in feature_defs:
    if type(x) == AggregationFeature:
        features_only_aggregations.append(x)
ft.save_features(features_only_aggregations, 'feature_definitions_aggregation.json')

Also, you might only want to select features that are calculated at a certain depth. You can do this by using the get_depth function.

features_only_depth_2 = []
for x in feature_defs:
    if x.get_depth() == 2:
        features_only_depth_2.append(x)
ft.save_features(features_only_depth_2, 'feature_definitions_depth_2.json')

Finally, you might only want features that return a certain type. You can do this by using the variable_type function.

from featuretools.variable_types import Numeric

features_only_numeric = []
for x in feature_defs:
    if x.variable_type == Numeric:
        features_only_numeric.append(x)
ft.save_features(features_only_numeric, 'feature_definitions_numeric.json')
jonimatix commented 5 years ago

Amazing, thank you so much!