Using certain combinations of primitives during dfs can result in a warning being raised. The warning message indicates there is likely a bug, but the behavior is actually expected based on these primitives. An example of the warning is shown below:
featuretools - WARNING Attempting to add feature <Feature: val2 / 1 / val2> which is already present. This is likely a bug.
The warning is easily reproduced by using these feature groups, assuming the same scalar values are used in the primitives that utilize a scalar:
['divide_numeric_scalar', 'divide_by_feature', 'divide_numeric'] or ['modulo_numeric_scalar', 'modulo_by_feature', 'modulo_numeric'] with max_depth=2.
The reason for this warning is raised is that there are multiple paths to arrive at the same feature with these groups of primitives.
For example the feature val2 / 1 / val2 can be created by first creating val2 / 1 with divide_numeric_scalar then dividing that by val2 with divide_numeric. The second path is by first creating 1 / val2 with divide_by_feature and then dividing val2 by that feature with divide_numeric.
Ideally we would update the process of generating features to avoid raising this warning in situations where we expect these duplicates to occur - or possibly avoid stacking features that create this situation.
Other options to consider:
Change the default values to reduce the chance that this will happen - if the primitives have different scalars used, this warning won't happen
Don't use these primitives that operate on a scalar unless a scalar value has been set by the user (a feature of a numeric divided by 1 isn't valuable anyway)
This issue is closely related to #832 and both potentially could be fixed together.
The following code can be used to reproduce this warning:
import pandas as pd
import featuretools as ft
es = ft.EntitySet('es')
df = pd.DataFrame({
'index': [0, 1, 2],
'val1': [1, 2, 1],
'val2': [10, 20, 30],
})
es.entity_from_dataframe(dataframe=df, entity_id='entity', index='index')
primitives = ft.list_primitives()
trans_primitives = ['divide_numeric_scalar', 'divide_by_feature', 'divide_numeric']
agg_primitives = []
fm, features = ft.dfs(entityset=es,
target_entity='entity',
trans_primitives=trans_primitives,
agg_primitives=agg_primitives,
max_depth=2)
Using certain combinations of primitives during dfs can result in a warning being raised. The warning message indicates there is likely a bug, but the behavior is actually expected based on these primitives. An example of the warning is shown below:
The warning is easily reproduced by using these feature groups, assuming the same scalar values are used in the primitives that utilize a scalar:
['divide_numeric_scalar', 'divide_by_feature', 'divide_numeric']
or['modulo_numeric_scalar', 'modulo_by_feature', 'modulo_numeric']
withmax_depth=2
.The reason for this warning is raised is that there are multiple paths to arrive at the same feature with these groups of primitives.
For example the feature
val2 / 1 / val2
can be created by first creatingval2 / 1
withdivide_numeric_scalar
then dividing that byval2
withdivide_numeric
. The second path is by first creating1 / val2
withdivide_by_feature
and then dividingval2
by that feature withdivide_numeric
.Ideally we would update the process of generating features to avoid raising this warning in situations where we expect these duplicates to occur - or possibly avoid stacking features that create this situation.
Other options to consider:
This issue is closely related to #832 and both potentially could be fixed together.
The following code can be used to reproduce this warning: