alteryx / featuretools

An open source python library for automated feature engineering
https://www.featuretools.com
BSD 3-Clause "New" or "Revised" License
7.25k stars 879 forks source link

AttributeError: 'float' object has no attribute 'total_seconds' #265

Closed alexelgier closed 6 years ago

alexelgier commented 6 years ago

I've been getting an error while running calculate_feature_matrix, I'm using a branch that was suggested to me in (https://github.com/Featuretools/featuretools/issues/252), related to a different problem.

It was suggested that perhaps one of my entities timeindex was not datetype, or that there might be missing values in that column, but I've checked my entityset and that's not the case. Sadly I can't share data because of legal reasons, although at the moment I'm trying to reproduce with an entityset filled with random data.

Any help debugging or finding the root of this problem would be appreciated.

Regards, Alex

Stacktrace:

File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 258, in calculate_feature_matrix
    pass_columns=pass_columns)
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 520, in linear_calculate_chunks
    backend=backend)
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 342, in calculate_chunk
    training_window=window)
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/utils.py", line 34, in wrapped
    r = method(*args, **kwargs)
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 316, in calc_results
    profile=profile)
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/pandas_backend.py", line 196, in calculate_all_features
    result_frame = handler(group, input_frames)
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/pandas_backend.py", line 313, in _calculate_transform_features
    values = feature_func(*variable_data)
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/primitives/transform_primitive.py", line 207, in pd_diff
    return grouped_df[bf_name].apply(lambda x: x.total_seconds())
  File "/home/mlgroup/NRM/venv/lib/python3.6/site-packages/pandas/core/series.py", line 3194, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer
  File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/primitives/transform_primitive.py", line 207, in 
    return grouped_df[bf_name].apply(lambda x: x.total_seconds())
AttributeError: 'float' object has no attribute 'total_seconds'
alexelgier commented 6 years ago

I've been able to replicate the problem using randomized data. I'll be posting the code shortly.

alexelgier commented 6 years ago

Here you can download a Zip file with the data in .parquet format and a main.py which reproduces the error.

https://www.sendspace.com/file/un5s6l

Also included is the requirements.txt for creating the venv Take into account that I'm not using the trunk of FeatureTools. pip install -e git://github.com/featuretools/featuretools.git@handle-empty-baseframe#egg=featuretools

kmax12 commented 6 years ago

@alexelgier thank you very much. we'll take a look as soon as we can and get back to you

kmax12 commented 6 years ago

@alexelgier we are able to run your code and reproduce the error. the fix will be coming shortly.

kmax12 commented 6 years ago

@alexelgier can you test with the branch named return-type? If this works for you, we will merge PR #266 into master for the next Featuretools release.

alexelgier commented 6 years ago

It seems to have worked (calculate_feature_matrix executed properly), but at a later point when I tried to convert the resulting dataframe to a spark dataframe there was an error regarding types, which leads me to believe that perhaps there is still some problem with the typing.

TypeError: field contractlength: Can not merge type <class 'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.DoubleType'>

In any case the particular error I was having was fixed. Thanks!

kmax12 commented 6 years ago

@alexelgier that seems like it might be an issue on spark's end, but let us know if you discover anything that suggests differently.

closing this issue, fixed by #266