Closed vkocaman closed 7 years ago
Hi @vkocaman
It's nearly impossible to debug without a stack trace. Can you please copy/paste the error message, along with all the other output that could help us debug?
As a general best practice, I like training on a small sample of the dataset (say, 1%) to make sure that things work before training on the entire dataset.
You might also fix this issue by upgrading all of your libraries with pip install --upgrade auto_ml
and pip install --upgrade lightgbm
.
If that doesn't fix it, could you please also include the output of pip freeze
?
here is the complete error message
TypeError Traceback (most recent call last) /Users/vkocaman/anaconda/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, *kwds) 56 try: ---> 57 return getattr(obj, method)(args, **kwds) 58
TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
I already made sure that all packages are up to date as for auto_ml.. and here is the pip freeze
alabaster==0.7.10 anaconda-client==1.6.3 anaconda-navigator==1.6.2 anaconda-project==0.6.0 appnope==0.1.0 appscript==1.0.1 asn1crypto==0.22.0 astroid==1.4.9 astropy==1.3.2 auto-ml==2.7.6 auto-sklearn==0.2.1 Babel==2.4.0 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.6.0 bitarray==0.8.1 blaze==0.10.1 bleach==1.5.0 bokeh==0.12.5 boto==2.46.1 Bottleneck==1.2.1 cffi==1.10.0 chardet==3.0.3 click==6.7 cloudpickle==0.2.2 clyent==1.2.2 colorama==0.3.9 ConfigSpace==0.3.10 contextlib2==0.5.5 cryptography==1.8.1 cycler==0.10.0 Cython==0.27.1 cytoolz==0.8.2 dask==0.14.3 datashape==0.5.4 deap==1.0.2 decorator==4.0.11 dill==0.2.7.1 distributed==1.16.3 docutils==0.13.1 entrypoints==0.2.2 et-xmlfile==1.0.1 fastcache==1.0.2 Flask==0.12.2 Flask-Cors==3.0.2 gevent==1.2.1 greenlet==0.4.12 h5py==2.7.1 HeapDict==1.0.0 html5lib==0.9999999 idna==2.5 imagesize==0.7.1 ipykernel==4.6.1 ipython==5.3.0 ipython-genutils==0.2.0 ipywidgets==6.0.0 isort==4.2.5 itsdangerous==0.24 jdcal==1.3 jedi==0.10.2 Jinja2==2.9.6 joblib==0.11 jsonschema==2.6.0 jupyter==1.0.0 jupyter-client==5.0.1 jupyter-console==5.1.0 jupyter-core==4.3.0 Keras==2.0.8 lazy-object-proxy==1.2.2 liac-arff==2.1.1 lightgbm==2.0.7 llvmlite==0.18.0 locket==0.2.0 lockfile==0.12.2 lxml==3.7.3 Markdown==2.6.9 MarkupSafe==0.23 matplotlib==2.0.2 mistune==0.7.4 mpmath==0.19 msgpack-python==0.4.8 multipledispatch==0.4.9 multiprocess==0.70.5 navigator-updater==0.1.0 nbconvert==5.1.1 nbformat==4.3.0 networkx==1.11 nltk==3.2.3 nose==1.3.7 notebook==5.0.0 numba==0.33.0 numexpr==2.6.2 numpy==1.13.3 numpydoc==0.6.0 odo==0.5.0 olefile==0.44 openpyxl==2.4.7 packaging==16.8 pandas==0.20.3 pandocfilters==1.4.1 partd==0.3.8 pathlib2==2.2.1 pathos==0.2.1 patsy==0.4.1 pep8==1.7.0 pexpect==4.2.1 pickleshare==0.7.4 Pillow==4.1.1 ply==3.10 pox==0.2.3 ppft==1.6.4.7.1 prompt-toolkit==1.0.14 protobuf==3.4.0 psutil==5.3.1 ptyprocess==0.5.1 py==1.4.33 pycosat==0.6.2 pycparser==2.17 pycrypto==2.6.1 pycurl==7.43.0 pyflakes==1.5.0 Pygments==2.2.0 pylint==1.6.4 pynisher==0.4.2 pyodbc==4.0.16 pyOpenSSL==17.0.0 pyparsing==2.1.4 pytest==3.0.7 python-dateutil==2.6.1 pytz==2017.2 PyWavelets==0.5.2 PyYAML==3.12 pyzmq==16.0.2 QtAwesome==0.4.4 qtconsole==4.3.0 QtPy==1.2.1 requests==2.14.2 rope-py3k==0.9.4.post1 scikit-image==0.13.0 scikit-learn==0.19.0 scikit-MDR==0.4.4 scipy==0.19.1 seaborn==0.7.1 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.11.0 sklearn==0.0 sklearn-deap2==0.2.1 skrebate==0.3.4 smac==0.6.0 snowballstemmer==1.2.1 sortedcollections==0.5.3 sortedcontainers==1.5.7 Sphinx==1.5.6 sphinx-rtd-theme==0.2.4 spyder==3.1.4 SQLAlchemy==1.1.9 statsmodels==0.8.0 stopit==1.1.1 sympy==1.0 tables==3.3.0 tabulate==0.8.1 tblib==1.3.2 tensorflow==1.3.0 tensorflow-tensorboard==0.1.8 terminado==0.6 testpath==0.3 toolz==0.8.2 tornado==4.5.1 TPOT==0.9.0 tqdm==4.19.1.post1 traitlets==4.3.2 typing==3.6.2 unicodecsv==0.14.1 update-checker==0.16 wcwidth==0.1.7 Werkzeug==0.12.2 widgetsnbextension==2.0.0 wrapt==1.10.10 xlrd==1.0.0 XlsxWriter==0.9.6 xlwings==0.10.4 xlwt==1.2.0 zict==0.1.2
Dear Preston,
Not to train all the model again, is there any other way to pass the best lgbm parameters into model so that I don't need to optimize final model again?
btw, after making sure that all packages are up to date, I just trained 1% of training set.. but no change..
the stack trace helps a lot, thanks!
it looks like you're probably trying to feed in a column of dtype float64 as a categorical column. when we try to convert that to a 32 bit string, it starts throwing an error because of the loss in precision.
my guess is you're probably feeding in some column like user_id
or order_id
as a categorical column. that would also explain why it takes a while to train. these columns should almost always be ignore
d, not used as categorical values.
i'll release a patch to handle this later tonight probably, but in the meantime, you can probably handle this yourself by just ignoring any categorical columns that are of dtype float64. or, convert those to a string yourself beforehand, and see if that handles it.
Thanks.. Even though I specified the categorical columns in column descriptions, there was error.. but I just changed the types of float64 columns to int and it produced predict_probas now.. but there were just ones and zeros.. and it's not what I need.. anyway, thank you again..
yeah, that's because lightgbm released a breaking update, without any deprecation warnings. you can use the previous version of lightgbm (v2.0.6), or i'll have a new release ready later tonight that fixes it too.
thanks for filing all the issues!
alright, should be all handled in the latest release (v2.7.7).
if you run into lightgbm issues, and you're on the latest version, let me know. it's running fine for a couple projects i'm working on, and in the test suite, but i'm always open to learning how other people use things.
Thanks, nice work!
Hi all..
After long hours of training my model with lightgbm, I just run predict_proba and at first I ran into data_rate_limit in Jupyiter.. then I changed that limit and had to train the model again.. but this time I ran into another error:
Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
can someone help me please? thanks