Open eddyfathi opened 2 weeks ago
import featuretools as ft
es
is your existing entity setes = ft.EntitySet(id="your_entity_set")
es = es.entity_from_dataframe(
entity_id="secondary_table",
dataframe=secondary_df, # your secondary dataframe
index="secondary_id", # primary key of the secondary table
time_index="valid_from" # use valid_from
as the time_index
)
relationship = ft.Relationship( es["target_table"]["target_id"], # Foreign key in target table es["secondary_table"]["secondary_id"], # Primary key in secondary table ) es = es.add_relationship(relationship)
valid_to
< cutoffdef filter_valid_rows(df, cutoff_time): return df[(df['valid_to'] >= cutoff_time)]
es["secondary_table"] = es["secondary_table"].df.groupby('secondary_id').apply(filter_valid_rows)
feature_matrix, feature_defs = ft.dfs( entityset=es, target_entity="target_table", cutoff_time=cutoff_times_df, # DataFrame containing cutoffs for each instance features_only=False )
This should help u , if u have any questions u can reach out to me
I am working on a dataset with multiple tables. I am using featuretools library for feature engineering. One of the tables that is NOT the target dataframe, comes with several columns. Three of three column are related to the conversation: ['rating', 'valid_from', 'valid_to']. I use valid_from as the time_index but am not sure how to incorporate valid_to column. If this was the target dataframe I could have used valid_to as cutoffs but since it's not the target dataframe I don't know how to set up the problem so there is no data leakage.
I also thought of using valid_to as the time_index but again I am not sure how to incorporate valid_from column in that case.