Closed ab-anssi closed 6 years ago
@ab-anssi the problem here is that the relationships must be one-to-many. In your code, the relationship between main
and ips
is a one-to-one since the relationship is on both of their index columns.
If I remove main
and build features for ips
as the target entity, it all runs. Here is my code: synthetic_flows_kmax12.zip
However, looks like your label is in main
, so I'd recommend merging them into one table before DFS if you need it or just extracting the label separately after getting your feature matrix.
Does that help? Also, feel free to post questions in our gitter chat room for faster replies.
If the code above works, you should see a feature matrix like this
AS_number country SUM(flows.duration) SUM(flows.num_bytes) STD(flows.duration) STD(flows.num_bytes) MAX(flows.duration) MAX(flows.num_bytes) SKEW(flows.duration) SKEW(flows.num_bytes) ... MEAN(flows.SKEW(packets.num_bytes)) MEAN(flows.SKEW(packets.timestamp)) MEAN(flows.MIN(packets.num_bytes)) MEAN(flows.MIN(packets.timestamp)) MEAN(flows.MEAN(packets.num_bytes)) MEAN(flows.MEAN(packets.timestamp)) MEAN(flows.COUNT(packets)) MEAN(flows.NUM_UNIQUE(packets.flags)) NUM_UNIQUE(flows.MODE(packets.flags)) MODE(flows.MODE(packets.flags))
ip ...
ext_0 710 çauà 2.133139e+15 451 3.131739e+14 17.319461 1.090809e+15 55 2.760679 0.656015 ... 0.194575 0.763481 3.904762 2.432896e+13 5.847619 7.428209e+13 4.285714 1.523810 5 ...
ext_1 934 Russia 1.562295e+02 103 3.725285e+01 19.601587 9.842145e+01 58 0.059337 -0.050977 ... 0.112961 0.933054 5.333333 3.583729e+02 7.600000 4.002507e+13 5.333333 3.000000 3 ...
ext_2 43 France 5.783267e+15 692 1.158610e+15 18.212290 4.741678e+15 72 3.363278 0.418732 ... -0.094430 0.726827 1.500000 1.973225e+02 5.993750 3.610764e+13 7.187500 2.500000 6 ...
ext_3 836 France 2.015486e+02 131 4.185565e+01 23.893281 9.679447e+01 63 -0.707107 -0.683954 ... -0.130006 0.005472 4.000000 1.199045e+01 7.366667 4.065750e+01 7.000000 3.333333 3 ...
ext_4 459 China 6.547545e+15 707 1.206763e+15 17.289520 5.491191e+15 67 3.910612 0.336770 ... 0.175274 1.208717 2.250000 2.208043e+02 5.570000 1.750398e+14 6.600000 1.500000 4 ...
ext_5 42 çauà 1.906901e+15 76 5.753438e+14 17.249799 1.393346e+15 39 0.308723 -0.691100 ... -0.037289 1.034720 1.000000 1.713112e+14 4.033333 2.975064e+14 5.333333 1.000000 2 ...
ext_6 38 France 2.537973e+15 477 3.450911e+14 17.026519 1.011224e+15 59 1.867307 -0.315863 ... -0.088037 0.697249 2.625000 8.857123e+01 5.643750 7.647689e+13 5.562500 1.687500 3 ...
ext_7 264 China 1.853257e+15 480 2.740526e+14 23.456719 1.034958e+15 73 2.833009 0.699610 ... 0.003389 0.673505 1.952381 8.658566e+13 4.009524 1.664640e+14 4.523810 1.666667 5 ...
ext_8 718 çauà 1.067596e+15 282 3.069124e+14 16.131689 1.067596e+15 48 2.846050 -0.079664 ... 0.233214 0.954317 2.181818 2.289887e+02 5.054545 1.197084e+14 5.272727 2.090909 5 ...
ext_9 823 France 9.757210e+13 317 2.805001e+13 18.644123 9.757210e+13 57 2.846050 -0.003848 ... 0.258081 0.555312 2.545455 1.485862e+02 5.172727 4.195574e+13 5.727273 1.545455 3 ...
[10 rows x 110 columns]
Thanks for your quick answer. It works !
I am trying to extract features automatically with featuretools from synthetic data I have created. An error is raised when I add a particular relationship
es.add_relationship(ips_flows)
(line 12 in synethetic_flows.py). The error says that the keyip
does not exist, while it does in the input data frame.Is it a bug ? Or do the linked columns need to meet some specific constraints ?
Context :
synthetic_flows.zip