alteryx / autonormalize

python library for automated dataset normalization
https://blog.featurelabs.com/automatic-dataset-normalization-for-feature-engineering-in-python/
BSD 3-Clause "New" or "Revised" License
111 stars 16 forks source link

an error I do not understand #19

Open albangabillon opened 4 years ago

albangabillon commented 4 years ago

Unable to add relationship because LotArea_LandContour in LotArea_LandContour is Pandas dtype int32 and LotArea_LandContour in index is Pandas dtype int64.

rwedge commented 4 years ago

Hi @albangabillon , thanks for the error report.

Could you post the full stack trace of the error you encountered?

albangabillon commented 4 years ago

housing_df = load_housing_data("train.csv") housing_df=housing_df.drop(columns=housing_df.columns[10:],axis=1) an.auto_entityset(housing_df, accuracy=1, name="esHousing") `100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 33.33it/s]

ValueError Traceback (most recent call last)

in ----> 1 an.auto_entityset(housing_df, accuracy=1, name="esHousing") c:\users\alban\anaconda3\envs\geron\lib\site-packages\autonormalize\autonormalize.py in auto_entityset(df, accuracy, index, name, time_index) 133 entityset (ft.EntitySet) : created entity set 134 """ --> 135 return make_entityset(df, find_dependencies(df, accuracy, index), name, time_index) 136 137 c:\users\alban\anaconda3\envs\geron\lib\site-packages\autonormalize\autonormalize.py in make_entityset(df, dependencies, name, time_index) 108 relationships.append((child.index[0], child.index[0], current.index[0], child.index[0])) 109 --> 110 return ft.EntitySet(name, entities, relationships) 111 112 c:\users\alban\anaconda3\envs\geron\lib\site-packages\featuretools\entityset\entityset.py in __init__(self, id, entities, relationships) 86 child_variable = self[relationship[2]][relationship[3]] 87 self.add_relationship(Relationship(parent_variable, ---> 88 child_variable)) 89 self.reset_data_description() 90 c:\users\alban\anaconda3\envs\geron\lib\site-packages\featuretools\entityset\entityset.py in add_relationship(self, relationship) 265 if not is_dtype_equal(parent_dtype, child_dtype): 266 raise ValueError(msg.format(parent_v, parent_e.id, parent_dtype, --> 267 child_v, child_e.id, child_dtype)) 268 269 self.relationships.append(relationship) ValueError: Unable to add relationship because LandContour_LotArea in LandContour_LotArea is Pandas dtype int32 and LandContour_LotArea in index is Pandas dtype int64.`
rwedge commented 4 years ago

It looks to be an issue with handling the underlying datatypes while creating the entityset. Are you able to share this data so I could try to replicate?

albangabillon commented 4 years ago

Hello,

Data comes from the beginner competition at kaggle. I attached the file.

Regards

Alban Gabillon Professeur des Universités en Informatique Tél.: (+689) 40 80 38 80 (GMT -10:00) B.P. 6570 - 98702 Faa’a - Tahiti - Polynésie française

De: "Roy Wedge" notifications@github.com À: "FeatureLabs/autonormalize" autonormalize@noreply.github.com Cc: "Gabillon" alban.gabillon@upf.pf, "Mention" mention@noreply.github.com Envoyé: Vendredi 6 Mars 2020 07:00:30 Objet: Re: [FeatureLabs/autonormalize] an error I do not understand (#19)

It looks to be an issue with handling the underlying datatypes while creating the entityset. Are you able to share this data so I could try to replicate?

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/FeatureLabs/autonormalize/issues/19?email_source=notifications&email_token=AHWBG3RLYLAC4CFAADPEZXTRGET25A5CNFSM4LCTZ6ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCCJZI#issuecomment-595862757 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AHWBG3TKLSAGAB2BPYIP2STRGET25ANCNFSM4LCTZ6ZA | unsubscribe ] .

rwedge commented 4 years ago

Hi, I didn't get the attached file but I presume it was the "House Prices: Advanced Regression Techniques" contest.

Two more questions

  1. Is load_housing_data("train.csv") a pd.read_csv call?
  2. Can you run
    featuretools info

    from the command line and share the ouput. It'll help make sure I've got the same environment for testing

Thanks!