Closed nielsuit227 closed 3 years ago
Thank you for trying this out!
It would be great if we can require sklearn <2 instead of having to rely on <1.1
Do you understand the FutureWarning and where this is triggered within ppscore? And how we might fix this already now?
Don't know where it happens in calculation.py
. The warnings is triggered by the last assertion in test_score
. This uses a categorical variable 'Sex_object' in the input which is fed as a string.
Understood - thank you for that insight. It would be highly appreciated if you can dig even one level deeper to see where it occurs within ppscore/calculation.py and to understand what we can (or maybe cannot when it happens in numpy or so?) do to prevent that warning at all in order to be future safe
Hey, I copied the master branch to my own system and changed "scikit-learn>=0.20.2,<1.0.0" to "scikit-learn>=0.20.2,<2.0.0" and I did not get this error. Has anymore work been done to find or fix this issue?
@JDM288 thanks for weighing in - can you share the pip list
for your environment?
Package Version
absl-py 0.12.0 alabaster 0.7.12 albumentations 0.1.12 altair 4.1.0 appdirs 1.4.4 argcomplete 1.12.3 argon2-cffi 21.1.0 arviz 0.11.4 astor 0.8.1 astropy 4.3.1 astunparse 1.6.3 atari-py 0.2.9 atomicwrites 1.4.0 attrs 21.2.0 audioread 2.1.9 autograd 1.3 Babel 2.9.1 backcall 0.2.0 beautifulsoup4 4.6.3 bleach 4.1.0 blis 0.4.1 bokeh 2.3.3 Bottleneck 1.3.2 branca 0.4.2 bs4 0.0.1 CacheControl 0.12.10 cached-property 1.5.2 cachetools 4.2.4 catalogue 1.0.0 certifi 2021.10.8 cffi 1.15.0 cftime 1.5.1.1 chardet 3.0.4 charset-normalizer 2.0.7 click 7.1.2 cloudpickle 1.3.0 cmake 3.12.0 cmdstanpy 0.9.5 colorcet 2.0.6 colorlover 0.3.0 community 1.0.0b1 contextlib2 0.5.5 convertdate 2.3.2 coverage 3.7.1 coveralls 0.5 crcmod 1.7 cufflinks 0.17.3 cvxopt 1.2.7 cvxpy 1.0.31 cycler 0.11.0 cymem 2.0.6 Cython 0.29.24 daft 0.0.4 dask 2.12.0 datascience 0.10.6 debugpy 1.0.0 decorator 4.4.2 defusedxml 0.7.1 descartes 1.1.0 dill 0.3.4 distributed 1.25.3 dlib 19.18.0 dm-tree 0.1.6 docopt 0.6.2 docutils 0.18 dopamine-rl 1.0.5 earthengine-api 0.1.288 easydict 1.9 ecos 2.0.7.post1 editdistance 0.5.3 en-core-web-sm 2.2.5 entrypoints 0.3 ephem 4.1 et-xmlfile 1.1.0 fa2 0.3.5 fastai 1.0.61 fastdtw 0.3.4 fastprogress 1.0.0 fastrlock 0.8 fbprophet 0.7.1 feather-format 0.4.1 filelock 3.3.2 firebase-admin 4.4.0 fix-yahoo-finance 0.0.22 Flask 1.1.4 flatbuffers 2.0 folium 0.8.3 future 0.16.0 gast 0.4.0 GDAL 2.2.2 gdown 3.6.4 gensim 3.6.0 geographiclib 1.52 geopy 1.17.0 gin-config 0.5.0 glob2 0.7 google 2.0.3 google-api-core 1.26.3 google-api-python-client 1.12.8 google-auth 1.35.0 google-auth-httplib2 0.0.4 google-auth-oauthlib 0.4.6 google-cloud-bigquery 1.21.0 google-cloud-bigquery-storage 1.1.0 google-cloud-core 1.0.3 google-cloud-datastore 1.8.0 google-cloud-firestore 1.7.0 google-cloud-language 1.2.0 google-cloud-storage 1.18.1 google-cloud-translate 1.5.0 google-colab 1.0.0 google-pasta 0.2.0 google-resumable-media 0.4.1 googleapis-common-protos 1.53.0 googledrivedownloader 0.4 graphviz 0.10.1 greenlet 1.1.2 grpcio 1.41.1 gspread 3.0.1 gspread-dataframe 3.0.8 gym 0.17.3 h5py 3.1.0 HeapDict 1.0.1 hijri-converter 2.2.2 holidays 0.10.5.2 holoviews 1.14.6 html5lib 1.0.1 httpimport 0.5.18 httplib2 0.17.4 httplib2shim 0.0.3 humanize 0.5.1 hyperopt 0.1.2 ideep4py 2.0.0.post3 idna 2.10 imageio 2.4.1 imagesize 1.3.0 imbalanced-learn 0.8.1 imblearn 0.0 imgaug 0.2.9 importlib-metadata 4.8.2 importlib-resources 5.4.0 imutils 0.5.4 inflect 2.1.0 iniconfig 1.1.1 intel-openmp 2021.4.0 intervaltree 2.1.0 ipykernel 4.10.1 ipython 5.5.0 ipython-genutils 0.2.0 ipython-sql 0.3.9 ipywidgets 7.6.5 itsdangerous 1.1.0 jax 0.2.21 jaxlib 0.1.71+cuda111 jdcal 1.4.1 jedi 0.18.0 jieba 0.42.1 Jinja2 2.11.3 joblib 1.1.0 jpeg4py 0.1.4 jsonschema 2.6.0 jupyter 1.0.0 jupyter-client 5.3.5 jupyter-console 5.2.0 jupyter-core 4.9.1 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.2 kaggle 1.5.12 kapre 0.3.5 keras 2.7.0 Keras-Preprocessing 1.1.2 keras-vis 0.4.1 kiwisolver 1.3.2 korean-lunar-calendar 0.2.1 libclang 12.0.0 librosa 0.8.1 lightgbm 2.2.3 llvmlite 0.34.0 lmdb 0.99 LunarCalendar 0.0.9 lxml 4.2.6 Markdown 3.3.4 MarkupSafe 2.0.1 matplotlib 3.2.2 matplotlib-inline 0.1.3 matplotlib-venn 0.11.6 missingno 0.5.0 mistune 0.8.4 mizani 0.6.0 mkl 2019.0 mlxtend 0.14.0 more-itertools 8.11.0 moviepy 0.2.3.5 mpmath 1.2.1 msgpack 1.0.2 multiprocess 0.70.12.2 multitasking 0.0.9 murmurhash 1.0.6 music21 5.5.0 natsort 5.5.0 nbclient 0.5.8 nbconvert 5.6.1 nbformat 5.1.3 nest-asyncio 1.5.1 netCDF4 1.5.8 networkx 2.6.3 nibabel 3.0.2 nltk 3.2.5 notebook 5.3.1 numba 0.51.2 numexpr 2.7.3 numpy 1.19.5 nvidia-ml-py3 7.352.0 oauth2client 4.1.3 oauthlib 3.1.1 okgrade 0.4.3 opencv-contrib-python 4.1.2.30 opencv-python 4.1.2.30 openpyxl 2.5.9 opt-einsum 3.3.0 osqp 0.6.2.post0 packaging 21.2 palettable 3.3.0 pandas 1.1.5 pandas-datareader 0.9.0 pandas-gbq 0.13.3 pandas-profiling 1.4.1 pandocfilters 1.5.0 panel 0.12.1 param 1.12.0 parso 0.8.2 pathlib 1.0.1 patsy 0.5.2 pep517 0.12.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 7.1.2 pip 21.1.3 pip-tools 6.2.0 plac 1.1.3 plotly 4.4.1 plotnine 0.6.0 pluggy 0.7.1 pooch 1.5.2 portpicker 1.3.9 ppscore 1.2.0 prefetch-generator 1.0.1 preshed 3.0.6 prettytable 2.4.0 progressbar2 3.38.0 prometheus-client 0.12.0 promise 2.3 prompt-toolkit 1.0.18 protobuf 3.17.3 psutil 5.4.8 psycopg2 2.7.6.1 ptyprocess 0.7.0 py 1.11.0 pyarrow 3.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycocotools 2.0.2 pycoingecko 2.2.0 pycparser 2.21 pyct 0.4.8 pydata-google-auth 1.2.0 pydot 1.3.0 pydot-ng 2.0.0 pydotplus 2.0.2 PyDrive 1.3.1 pyemd 0.5.1 pyerfa 2.0.0.1 pyglet 1.5.0 Pygments 2.6.1 pygobject 3.26.1 pymc3 3.11.4 PyMeeus 0.5.11 pymongo 3.12.1 pymystem3 0.2.0 PyOpenGL 3.1.5 pyparsing 2.4.7 pyrsistent 0.18.0 pysndfile 1.3.8 PySocks 1.7.1 pystan 2.19.1.1 pytest 3.6.4 python-apt 0.0.0 python-chess 0.23.11 python-dateutil 2.8.2 python-louvain 0.15 python-slugify 5.0.2 python-utils 2.5.6 pytrends 4.7.3 pytz 2018.9 pyviz-comms 2.1.0 PyWavelets 1.2.0 PyYAML 3.13 pyzmq 22.3.0 qdldl 0.1.5.post0 qtconsole 5.2.0 QtPy 1.11.2 regex 2019.12.20 requests 2.23.0 requests-oauthlib 1.3.0 resampy 0.2.2 retrying 1.3.3 rpy2 3.4.5 rsa 4.7.2 scikit-image 0.18.3 scikit-learn 1.0.1 scipy 1.4.1 screen-resolution-extra 0.0.0 scs 2.1.4 seaborn 0.11.2 semver 2.13.0 Send2Trash 1.8.0 setuptools 57.4.0 setuptools-git 1.2 Shapely 1.8.0 simplegeneric 0.8.1 six 1.15.0 sklearn 0.0 sklearn-pandas 1.8.0 smart-open 5.2.1 snowballstemmer 2.1.0 sortedcontainers 2.4.0 SoundFile 0.10.3.post1 spacy 2.2.4 Sphinx 1.8.5 sphinxcontrib-serializinghtml 1.1.5 sphinxcontrib-websupport 1.2.4 SQLAlchemy 1.4.26 sqlparse 0.4.2 srsly 1.0.5 statsmodels 0.10.2 sympy 1.7.1 tables 3.4.4 tabulate 0.8.9 tblib 1.7.0 tensorboard 2.7.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.0 tensorflow 2.7.0 tensorflow-datasets 4.0.1 tensorflow-estimator 2.7.0 tensorflow-gcs-config 2.7.0 tensorflow-hub 0.12.0 tensorflow-io-gcs-filesystem 0.22.0 tensorflow-metadata 1.4.0 tensorflow-probability 0.14.1 termcolor 1.1.0 terminado 0.12.1 testpath 0.5.0 text-unidecode 1.3 textblob 0.15.3 Theano-PyMC 1.1.2 thinc 7.4.0 threadpoolctl 3.0.0 tifffile 2021.11.2 toml 0.10.2 tomli 1.2.2 toolz 0.11.2 torch 1.10.0+cu111 torchsummary 1.5.1 torchtext 0.11.0 torchvision 0.11.1+cu111 tornado 5.1.1 tqdm 4.62.3 traitlets 5.1.1 tweepy 3.10.0 TwitterAPI 2.7.7 typeguard 2.7.1 typing-extensions 3.10.0.2 tzlocal 1.5.1 uritemplate 3.0.1 urllib3 1.24.3 vega-datasets 0.9.0 wasabi 0.8.2 wcwidth 0.2.5 webencodings 0.5.1 Werkzeug 1.0.1 wheel 0.37.0 widgetsnbextension 3.5.2 wordcloud 1.5.0 wrapt 1.13.3 xarray 0.18.2 xgboost 0.90 xkit 0.0.0 xlrd 1.1.0 xlwt 1.3.0 yellowbrick 1.3.post1 zict 2.0.0 zipp 3.6.0
Thank you, since you have sklearn 1.0.x this means that the warning might be introduced in 1.1
@fwetdb Ah, any luck yet in figuring out what is causing it?
@fwetdb I figured out what was causing it. I opened a new pull request with the files changed. When calculations.py called cross_val_score and mean_absolute_error from sklearn, calculations.py was sending pandas series as inputs. Those series were set to "Int64" by default, but it was not explicitly stated. So my original solution was to use .to_numpy() on the pandas series, but once I dug a little deeper, I realized using .astype("int64") or .astype("float") etc works just fine too. It just needs to be explicitly declared. This error only pops up when a pandas series with undeclared data type is sent to a sklearn function from sklearn version <=1.0.0. The only places that I could find that happening were in _calculate_model_cvscore() and _mae_normalizer() in calculations.py. Changing those series to numpy arrays or using .astype() fixes this issue. Let me know if you have any more questions. I hope this will make some progress
@tkrabel @8080labs ^^^^^^^^^^^^^
Thank you a lot for looking into this! I added a small comment to #57 and would appreciate your input
Closed in favor of #57
Ran the tests and all passed. One future warning from scikit for version 1.1: