alteryx / ta1-primitives

2 stars 3 forks source link

d3m 2019.1.21 #4

Closed csala closed 5 years ago

csala commented 5 years ago

Adapt to new d3m primitive naming schema and add demo Pipelines.

The main changes are:

  1. Now the primitives are specified with boolean hyperparameters like aggregation_sum=True/False or transform_day=True/False. This is done dynamically, using the featuretools.list_primitives() method output as the reference. Also, by default, only the ones that are used by default in featuretools.dfs are True.
  2. The naming schema has changed to be compliant with the new rules. Now the DFS primitive is called d3m.primitives.feature_creation.deep_feature_synthesis.Featuretools.
  3. All the primitives are available inside the list featuretools_ta1.PRIMITIVES (for now there is only one)
  4. The DFS primitive has a method get_demo_pipeline that builds a basic pipeline that showcases the usage of the primitive. If we add new primitives, they should also implement this method.
  5. There are two new scripts inside the scripts folder. One used to generate the JSON annotations, including primitive.json, pipeline.json and pipeline.meta, and the other one is used to run the demo pipeline on a particular dataset. This script automatically downloads the indicated dataset from S3 if it is not found locally.
  6. New JSON annotations ready to be included in the primitives_repo have been included.
kmax12 commented 5 years ago

the PR is looking good. some comments

sum
last
percent_true
n_most_common
num_true
std
num_unique
skew
time_since_last
mode
max
median
avg_time_between
min
mean
count
second
day
characters
absolute
latitude
is_weekend
month
minute
hour
numwords
weekday
percentile
year
negate
week
csala commented 5 years ago

I added the list of primitives as indicated and also upgraded to featuretools 0.6.0. The featuretools version, btw, is listed in the setup.py module.