Closed jrkinley-zz closed 6 years ago
pip freeze
alabaster==0.7.7 anaconda-client==1.4.0 anaconda-navigator==1.1.0 argcomplete==1.0.0 astropy==1.1.2 Babel==2.2.0 backports-abc==0.4 backports.shutil-get-terminal-size==1.0.0 backports.ssl-match-hostname==3.4.0.2 beautifulsoup4==4.4.1 bitarray==0.8.1 blaze==0.9.1 bokeh==0.11.1 boto==2.39.0 boto3==1.7.40 botocore==1.10.40 Bottleneck==1.0.0 cdecimal==2.3 cdsw==1.0.0 cffi==1.5.2 chest==0.2.3 click==6.7 cloudpickle==0.5.3 clyent==1.2.1 colorama==0.3.7 conda==4.0.5 conda-build==1.20.0 conda-env==2.4.5 conda-manager==0.3.1 configobj==5.0.6 cryptography==1.3 cycler==0.10.0 Cython==0.25.2 cytoolz==0.7.5 dask==0.19.1 datashape==0.5.1 decorator==4.3.0 dill==0.2.4 distributed==1.23.1 docopt==0.6.2 docutils==0.12 dynd==0.7.3.dev1 enum34==1.1.6 et-xmlfile==1.0.1 fastcache==1.0.2 featuretools==0.1.21 Flask==0.12 Flask-Cors==2.1.2 funcsigs==0.4 functools32==3.2.3.post2 future==0.16.0 futures==3.2.0 fuzzywuzzy==0.16.0 gevent==1.1.0 greenlet==0.4.9 grin==1.2.1 h5py==2.5.0 hdfs==2.1.0 HeapDict==1.0.0 ibis==1.6.0 ibis-framework==0.13.0 idna==2.0 impala==0.2 impyla==0.14.1 ipaddress==1.0.14 ipykernel==4.3.1 ipython==5.1.0 ipython-genutils==0.2.0 ipywidgets==4.1.1 itsdangerous==0.24 jdcal==1.2 jedi==0.9.0 Jinja2==2.10 jmespath==0.9.3 jsonschema==2.4.0 jupyter==1.0.0 jupyter-client==4.2.2 jupyter-console==4.1.1 jupyter-core==4.1.0 kudu-python==1.2.0 llvmlite==0.9.0 locket==0.2.0 lxml==3.6.0 MarkupSafe==1.0 matplotlib==2.0.0 mistune==0.7.2 mpmath==0.19 msgpack==0.5.6 multipledispatch==0.4.8 nbconvert==4.1.0 nbformat==4.4.0 networkx==1.11 nltk==3.2 nose==1.3.7 notebook==4.1.0 numba==0.24.0 numexpr==2.5 numpy==1.14.5 odo==0.4.2 openpyxl==2.3.2 pandas==0.23.1 pandas-datareader==0.2.1 partd==0.3.2 path.py==0.0.0 pathlib2==2.3.2 patsy==0.4.0 pep8==1.7.0 pexpect==4.6.0 pickleshare==0.7.4 Pillow==3.1.1 plotly==2.5.1 ply==3.8 prompt-toolkit==1.0.15 psutil==4.1.0 ptyprocess==0.5.2 py==1.4.31 py4j==0.10.7 pyasn1==0.1.9 pycairo==1.10.0 pycosat==0.6.1 pycparser==2.14 pycrypto==2.6.1 pycurl==7.19.5.3 pyflakes==1.1.0 Pygments==2.2.0 Pympler==0.5 pyOpenSSL==0.15.1 pyparsing==2.2.0 pytest==2.8.5 python-dateutil==2.7.3 python-Levenshtein==0.12.0 pytz==2018.4 PyYAML==3.12 pyzmq==15.2.0 QtAwesome==0.3.2 qtconsole==4.2.0 QtPy==1.0 redis==2.10.3 regex==2018.2.21 requests==2.13.0 requests-file==1.4.3 rope==0.9.4 s3fs==0.1.5 s3transfer==0.1.13 sasl==0.2.1 scandir==1.7 scikit-image==0.12.3 scikit-learn==0.19.1 scipy==1.1.0 seaborn==0.8 simplegeneric==0.8.1 simplejson==3.10.0 singledispatch==3.4.0.3 six==1.11.0 snowballstemmer==1.2.1 sockjs-tornado==1.0.1 sortedcontainers==2.0.5 sphinx-rtd-theme==0.1.9 spyder==2.3.8 SQLAlchemy==1.0.12 statsmodels==0.6.1 subprocess32==3.5.2 sympy==1.0 tables==3.2.2 tblib==1.3.2 terminado==0.5 thrift==0.9.3 thrift-sasl==0.2.1 thriftpy==0.3.9 toolz==0.9.0 tornado==5.1 tqdm==4.23.4 traitlets==4.3.2 unicodecsv==0.14.1 wcwidth==0.1.7 Werkzeug==0.14.1 xlrd==0.9.4 XlsxWriter==0.8.4 xlwt==1.0.0 zict==0.1.3
Unfortunately I can't share any code or data, but to give you an idea the entity set is composed of 7 entities and 6 relationships that share a common join key. The first 2 dataframes have a unique index that I specify when calling entity_from_dataframe
. The other 5 dataframes don't have a unique index column so I specify both index
and make_index
when calling entity_from_dataframe
. This works ok in v0.1.21.
I don't think I'm doing anything out of the ordinary when calling dfs
. I specify a couple of seed features, pass it the cutoff dates, and make use of drop_contains
and drop_exact
.
If I'm interpreting the error correctly, it looks like _calculate_agg_features
expects the underlying dataframes to have a multi-level index, given to_merge.reset_index(1, drop=True, inplace=True)
and its failing because the specific dataframe has only a single level 0.
@jrkinley I cannot reproduce, but after looking at the code, I was able to refactor our implementation to not require the reset index in #250, which results in cleaner code and may resolve your problem.
Can you try to install that branch of featuretools and run your code? You can install that branch using pip with this command
pip install -e git://github.com/featuretools/featuretools.git@clean-agg-merge#egg=featuretools
Let us know if it helps!
@kmax12 Thanks for the patch. Unfortunately it results in another error:
> ...
> calculate_feature_matrix.py in calc_results (316)
> pandas_backend.py in calculate_all_features (196)
> pandas_backend.py in _calculate_agg_features (486)
> ...
KeyError: u'TREND(<entity>.<variable>, <time_index>)'
@kmax12, your change appears to have got past the point where the first IndexError was thrown. The new KeyError is being thrown when checking if any of the features in the dataframe are of boolean type. At this point the feature in question appears to be missing.
...
frame[f.get_name()].dtype.name in ['object', 'bool']):
...
@jrkinley thanks looking into this. if you remove the trend primitive does the error still occur? can you tell if other features are missing?
I am having a similar problem, I was getting the error "IndexError: Too many levels: Index has only 1 level, not 2", and after installing this branch, am getting KeyError on a TIME_SINCE_PREVIOUS feature. Removing TIME_SINCE_PREVIOUS from the primitives I'm using didn't help as I started getting KeyError with TIME_SINCE_LAST, after removing that one, I started getting KeyError on TREND.
Any help would be apreciated as it seems this isn't just happening to me.
@alexelgier thanks. we are able to reproduce and are looking into it now
@jrkinley @alexelgier can you try the branch handle-empty-baseframe
and see if it solves your problem?
thanks again for helping us out!
pip install -e git://github.com/featuretools/featuretools.git@handle-empty-baseframe#egg=featuretools
I've installed the branch and am running the code currently, will let you know if it works =) Thanks so much for the quick response
We're no longer getting the IndexError nor the KeyError, but now we're getting an AttributeError:
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 258, in calculate_feature_matrix
pass_columns=pass_columns)
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 520, in linear_calculate_chunks
backend=backend)
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 342, in calculate_chunk
training_window=window)
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/utils.py", line 34, in wrapped
r = method(*args, *kwargs)
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/calculate_feature_matrix.py", line 316, in calc_results
profile=profile)
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/pandas_backend.py", line 196, in calculate_all_features
result_frame = handler(group, input_frames)
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/computational_backends/pandas_backend.py", line 313, in _calculate_transform_features
values = feature_func(variable_data)
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/primitives/transform_primitive.py", line 207, in pd_diff
return grouped_df[bf_name].apply(lambda x: x.total_seconds())
File "/home/mlgroup/NRM/venv/lib/python3.6/site-packages/pandas/core/series.py", line 3194, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer
File "/home/mlgroup/NRM/venv/src/featuretools/featuretools/primitives/transform_primitive.py", line 207, in
@alexelgier this issue looks like you have an incorrect underlying datatype for the datetime column used by a TimeSincePrevious
feature. Can you check that your time index in each entity is all datetimes and has no nan values?
@kmax12 the branch is working for me. Thanks for your help!
I've checked the EntitySet and the data seems ok. Is there any other reason I might be getting this error?
@alexelgier can you share your data or a reproducible example? you can email us at help@featuretools.com.
Sadly because of legal issues I cannot share the data I'm working on.
I've checked the EntitySet and all the time indexes in my entities are of type datetime_time_index and have no missing values.
Is there any other reason I might be getting this error? Perhaps you could further suggest how I could debug this.
@alexelgier the problem here appears to be with the TimeSincePrevious
primitive. Can you open another issue for this discussion?
Will do. Thanks for the help!
Hi, is this branch still available ?
Did not find branch or tag 'handle-empty-baseframe', assuming revision or ref.
error: pathspec 'handle-empty-baseframe' did not match any file(s) known to git.
@pabloazurduy it's been merged into master, so it should be in the latest release v0.5.1
of featuretools. if you're hitting an error still, please open a new issue
Featuretools'
dfs()
method fails to run on my entity set after upgrading from v0.1.21 to v0.2.x and v0.3.0.The error is raised when the Pandas backend tries to calculate the aggregate features
_calculate_agg_features()
. In particular:--> 442 to_merge.reset_index(1, drop=True, inplace=True)
...
IndexError: Too many levels: Index has only 1 level, not 2
This is working fine in v0.1.x and the entity set hasn't changed after the upgrade. The entity set is composed of 7 entities and 6 relationships. Each entity (dataframe) is added via
entity_from_dataframe
.