alteryx / featuretools

An open source python library for automated feature engineering
https://www.featuretools.com
BSD 3-Clause "New" or "Revised" License
7.25k stars 879 forks source link

How to set the relationship in two column which is not index? #531

Closed xuehui1991 closed 5 years ago

xuehui1991 commented 5 years ago

Bug/Feature Request Description

(replace this text with your issue)

I want to set the relationship of two columns in two tables. For example:

# Relationship between clients and previous loans
r_client_previous = ft.Relationship(es['clients']['client_att'],
                                    es['loans']['client_att'])

# Add the relationship to the entity set
es = es.add_relationship(r_client_previous)

But the es['clients']['client_att'] and es['loans']['client_att'] is not the index of data frames of 'clients' and 'loans', which means these columns may contains the repreated value and not unique. Then it will riase an error.

So does featuretools support to set the relationship of two not-index columns?

Thank you!

Issues created here on Github are for bugs or feature requests. For usage questions and questions about errors, please ask on Stack Overflow with the featuretools tag. Check the documentation for further guidance on where to ask your question.

kmax12 commented 5 years ago

In a parent-child relationship, only the parent has to have a unique index. In this case, the column in client dataframe should be unique, but the column in the loans dataframe won't be unique. That is because one client might have many previous loans. If your client entity doesn't have one row per client, you likely will want to try to arrange your data such that it does.

Closing this issue for now. If you have other questions about using Featuretools, please post on our Slack or on Stack Overflow