alteryx / featuretools

An open source python library for automated feature engineering
https://www.featuretools.com
BSD 3-Clause "New" or "Revised" License
7.25k stars 879 forks source link

Error while adding a relationship #49

Closed ab-anssi closed 6 years ago

ab-anssi commented 6 years ago

I am trying to extract features automatically with featuretools from synthetic data I have created. An error is raised when I add a particular relationship es.add_relationship(ips_flows) (line 12 in synethetic_flows.py). The error says that the key ip does not exist, while it does in the input data frame.

Is it a bug ? Or do the linked columns need to meet some specific constraints ?

Context :

synthetic_flows.zip

kmax12 commented 6 years ago

@ab-anssi the problem here is that the relationships must be one-to-many. In your code, the relationship between main and ips is a one-to-one since the relationship is on both of their index columns.

If I remove main and build features for ips as the target entity, it all runs. Here is my code: synthetic_flows_kmax12.zip

However, looks like your label is in main, so I'd recommend merging them into one table before DFS if you need it or just extracting the label separately after getting your feature matrix.

Does that help? Also, feel free to post questions in our gitter chat room for faster replies.

kmax12 commented 6 years ago

If the code above works, you should see a feature matrix like this

       AS_number country  SUM(flows.duration)  SUM(flows.num_bytes)  STD(flows.duration)  STD(flows.num_bytes)  MAX(flows.duration)  MAX(flows.num_bytes)  SKEW(flows.duration)  SKEW(flows.num_bytes)               ...                MEAN(flows.SKEW(packets.num_bytes))  MEAN(flows.SKEW(packets.timestamp))  MEAN(flows.MIN(packets.num_bytes))  MEAN(flows.MIN(packets.timestamp))  MEAN(flows.MEAN(packets.num_bytes))  MEAN(flows.MEAN(packets.timestamp))  MEAN(flows.COUNT(packets))  MEAN(flows.NUM_UNIQUE(packets.flags)) NUM_UNIQUE(flows.MODE(packets.flags)) MODE(flows.MODE(packets.flags))
ip                                                                                                                                                                                                                   ...                                                                                                                                                                                                                                                                                                                                                                                   
ext_0        710    çauà         2.133139e+15                   451         3.131739e+14             17.319461         1.090809e+15                    55              2.760679               0.656015               ...                                           0.194575                             0.763481                            3.904762                        2.432896e+13                             5.847619                         7.428209e+13                    4.285714                               1.523810                                     5                             ...
ext_1        934  Russia         1.562295e+02                   103         3.725285e+01             19.601587         9.842145e+01                    58              0.059337              -0.050977               ...                                           0.112961                             0.933054                            5.333333                        3.583729e+02                             7.600000                         4.002507e+13                    5.333333                               3.000000                                     3                             ...
ext_2         43  France         5.783267e+15                   692         1.158610e+15             18.212290         4.741678e+15                    72              3.363278               0.418732               ...                                          -0.094430                             0.726827                            1.500000                        1.973225e+02                             5.993750                         3.610764e+13                    7.187500                               2.500000                                     6                             ...
ext_3        836  France         2.015486e+02                   131         4.185565e+01             23.893281         9.679447e+01                    63             -0.707107              -0.683954               ...                                          -0.130006                             0.005472                            4.000000                        1.199045e+01                             7.366667                         4.065750e+01                    7.000000                               3.333333                                     3                             ...
ext_4        459   China         6.547545e+15                   707         1.206763e+15             17.289520         5.491191e+15                    67              3.910612               0.336770               ...                                           0.175274                             1.208717                            2.250000                        2.208043e+02                             5.570000                         1.750398e+14                    6.600000                               1.500000                                     4                             ...
ext_5         42    çauà         1.906901e+15                    76         5.753438e+14             17.249799         1.393346e+15                    39              0.308723              -0.691100               ...                                          -0.037289                             1.034720                            1.000000                        1.713112e+14                             4.033333                         2.975064e+14                    5.333333                               1.000000                                     2                             ...
ext_6         38  France         2.537973e+15                   477         3.450911e+14             17.026519         1.011224e+15                    59              1.867307              -0.315863               ...                                          -0.088037                             0.697249                            2.625000                        8.857123e+01                             5.643750                         7.647689e+13                    5.562500                               1.687500                                     3                             ...
ext_7        264   China         1.853257e+15                   480         2.740526e+14             23.456719         1.034958e+15                    73              2.833009               0.699610               ...                                           0.003389                             0.673505                            1.952381                        8.658566e+13                             4.009524                         1.664640e+14                    4.523810                               1.666667                                     5                             ...
ext_8        718    çauà         1.067596e+15                   282         3.069124e+14             16.131689         1.067596e+15                    48              2.846050              -0.079664               ...                                           0.233214                             0.954317                            2.181818                        2.289887e+02                             5.054545                         1.197084e+14                    5.272727                               2.090909                                     5                             ...
ext_9        823  France         9.757210e+13                   317         2.805001e+13             18.644123         9.757210e+13                    57              2.846050              -0.003848               ...                                           0.258081                             0.555312                            2.545455                        1.485862e+02                             5.172727                         4.195574e+13                    5.727273                               1.545455                                     3                             ...

[10 rows x 110 columns]
ab-anssi commented 6 years ago

Thanks for your quick answer. It works !