cpa-analytics / embedding-encoder

Scikit-Learn compatible transformer that turns categorical variables into dense entity embeddings.
MIT License
41 stars 7 forks source link

6 July 2022 Error: 'EmbeddingEncoder' object has no attribute '_validate_data' #23

Closed KyleeValencia closed 2 years ago

KyleeValencia commented 2 years ago

This error come after I restart the kaggle notebook right now without any code and directory changing. Before it just run successfully. And it both happen for local EE (EE that not come from pretrained) image

and pretrained EE image

Is there some dependecies changing in this module or the kaggle package have incompatible version with this package ?

The data and file needed on kaggle: https://www.kaggle.com/datasets/kyleev/dl-long-mush-stalkroot-na ->pretrained https://www.kaggle.com/datasets/kyleev/tf-embed-feature-categorical-mushroom-data https://www.kaggle.com/datasets/ltrahul/mushrooms-classification-dataset

The function code:

def Feature_Target_Format(df: pd.DataFrame, valY: pd.Series):
    newD = df.copy()
    # Relabel the data
    newX = relabled_df(newD)

    # Imputer the 'Special Missing' column from pretrained neural network imputer
    Imputer_Real = Imputer.load("../input/dl-long-mush-stalkroot-na/missing_mushroom")
    missing_pred = Imputer_Real.predict(newX)
    newX = newX.join(missing_pred["stalk-root_imputed"])
    newX["stalk-root"] = np.where(newX["stalk-root"]=='missing', 
                                      newX["stalk-root_imputed"], 
                                      newX["stalk-root"])
    newX.drop(columns='stalk-root_imputed', inplace=True)

    # Format 'two special' column into numeric and boolean 
    newX[['ring-number','bruises']] = newX[['ring-number','bruises']].replace(order_int_list)

    # Scale the numeric data
    newX = Numeric_Scaler(newX, MinMaxScaler(), ['ring-number'])
    newX[['ring-number']] = newX[['ring-number']].astype('int64')
    newX[['bruises']] = newX[['bruises']].astype('int64')
#     print(newX.info())

    # Format Y_value into boolean
    newY = pd.DataFrame(valY.copy()).replace(order_int_list)['class']

    # Embedding Encoder of string Categorical Data using pretrained neural network embedding-encoder
    category_cols = list(newX.columns[(newX.dtypes=='object').values==True])
    Embed_load = EmbeddingEncoder(task = 'classification', 
                            pretrained = True, 
                            mapping_path='../input/tf-embed-feature-categorical-mushroom-data/Embed_TF_Mushromm_Categorical_Data.json')
    Embed_load.fit(newX[category_cols], newY)

    newN = Embed_load.transform(newX[category_cols])
    newX = pd.concat([newX[['ring-number','bruises']], newN], axis=1)

    return newX,newY

Edit 1: This error doesn't exist if I run it on google colab. But when I'm trying to import this in google colab it give an error image

rxavier commented 2 years ago

Hi. _validate_data() is a method in sklearn's BaseEstimator which EmbeddingEncoder inherits, I'm not really sure how it can be missing.

I just started a Kaggle notebook and did (after pip installing):

from embedding_encoder import EmbeddingEncoder

ee = EmbeddingEncoder(task="classification")
hasattr(ee, "_validate_data")

Which was True.

Could you provide a reproducible example?

Edit: I now notice you're running this in local. Could you run

import sklearn

sklearn.__version__
KyleeValencia commented 2 years ago

Hi. _validate_data() is a method in sklearn's BaseEstimator which EmbeddingEncoder inherits, I'm not really sure how it can be missing.

I just started a Kaggle notebook and did (after pip installing):

from embedding_encoder import EmbeddingEncoder

ee = EmbeddingEncoder(task="classification")
hasattr(ee, "_validate_data")

Which was True.

Could you provide a reproducible example?

Edit: I now notice you're running this in local. Could you run

import sklearn

sklearn.__version__

This is my kaggle sklearn version image

rxavier commented 2 years ago

Please provide a reproducible example including the data you're using and output of pip freeze.

KyleeValencia commented 2 years ago

Hi this is the code and the dependencies :

The Dataset and Pretrained Model dependencies link

The test case usage

testX_t, testY_t = Feature_Target_Format(pd.read_csv('../input/mushroom-raw-splitted-train-test-xy/mushromm_X_test_df.csv'), pd.read_csv('../input/mushroom-raw-splitted-train-test-xy/mushromm_Y_test_df.csv')['class'])<br>

The error image

The pip freeze output

absl-py @ file:///home/conda/feedstock_root/build_artifacts/absl-py_1637088766493/work
accelerate==0.10.0
access==1.1.8
affine==2.3.1
aiobotocore==2.3.3
aiohttp @ file:///home/conda/feedstock_root/build_artifacts/aiohttp_1649013150570/work
aioitertools==0.10.0
aiosignal @ file:///home/conda/feedstock_root/build_artifacts/aiosignal_1636093929600/work
albumentations==1.2.0
alembic==1.8.0
allennlp==2.9.3
altair==4.2.0
annoy==1.17.0
ansiwrap==0.8.4
anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1652463872367/work/dist
apache-beam==2.39.0
aplus==0.11.0
appdirs @ file:///home/conda/feedstock_root/build_artifacts/appdirs_1603108395799/work
argon2-cffi @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi_1640817743617/work
argon2-cffi-bindings @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi-bindings_1649500320262/work
arrow @ file:///home/conda/feedstock_root/build_artifacts/arrow_1643313750486/work
arviz==0.12.1
asgiref==3.5.2
asn1crypto @ file:///home/conda/feedstock_root/build_artifacts/asn1crypto_1647369152656/work
astroid==2.11.6
astropy @ file:///home/conda/feedstock_root/build_artifacts/astropy_1636583255099/work
astunparse==1.6.3
async-timeout @ file:///home/conda/feedstock_root/build_artifacts/async-timeout_1640026696943/work
asynctest==0.13.0
atpublic==2.3
attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1640799537051/work
audioread==2.1.9
autocfg==0.0.8
autopage==0.5.1
autopep8==1.6.0
aws-requests-auth==0.4.3
Babel @ file:///home/conda/feedstock_root/build_artifacts/babel_1651737115240/work
backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1618230623929/work
backports.zoneinfo==0.2.1
base58==2.1.1
bayesian-optimization==1.2.0
bayespy==0.5.22
beatrix-jupyterlab @ file:///tmp/beatrix_jupyterlab-latest.tar.gz
beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1649463573192/work
bidict==0.22.0
binaryornot==0.4.4
biopython==1.79
black @ file:///home/conda/feedstock_root/build_artifacts/black-recipe_1648499330704/work
blake3==0.2.1
bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1649361991009/work
blinker==1.4
blis==0.7.7
bokeh==2.4.3
Boruta==0.3
boto3==1.24.10
botocore==1.27.10
-e git+https://github.com/SohierDane/BigQuery_Helper@8615a7f6c1663e7f2d48aa2b32c2dbcb600a440f#egg=bq_helper
bqplot==0.12.33
branca==0.5.0
brewer2mpl==1.4.1
brotlipy==0.7.0
cached-path==1.1.3
cached-property==1.5.2
cachetools==4.2.4
Cartopy @ file:///home/conda/feedstock_root/build_artifacts/cartopy_1630680835556/work
catalogue==1.0.0
catalyst==22.4
catboost==1.0.6
category-encoders==2.5.0
certifi==2022.6.15
cesium==0.9.12
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1636046052501/work
cftime==1.6.0
chardet @ file:///tmp/build/80754af9/chardet_1607706768982/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1644853463426/work
chex==0.1.3
clang==5.0
cleverhans==4.0.0
click==8.0.4
click-plugins==1.1.1
cliff==3.10.1
cligj==0.7.2
cloud-tpu-client==0.10
cloud-tpu-profiler==2.4.0
cloudpickle @ file:///home/conda/feedstock_root/build_artifacts/cloudpickle_1653061851209/work
cmaes==0.8.2
cmd2==2.4.1
cmdstanpy==0.9.68
cmudict==1.0.2
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1602866480661/work
colorcet==3.0.0
colorlog==6.6.0
colorlover==0.3.0
commonmark==0.9.1
conda==4.13.0
conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1649385049221/work
configparser==5.2.0
confuse @ file:///home/conda/feedstock_root/build_artifacts/confuse_1638044079768/work
contextily==1.2.0
contextlib2==21.6.0
convertdate==2.4.0
cookiecutter @ file:///home/conda/feedstock_root/build_artifacts/cookiecutter_1643669229020/work
crcmod==1.7
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1652967085355/work
cufflinks==0.17.3
CVXcanon==0.1.2
cycler @ file:///home/conda/feedstock_root/build_artifacts/cycler_1635519461629/work
cymem==2.0.6
cysignals==1.11.2
Cython==0.29.30
cytoolz==0.11.2
daal==2021.5.3
daal4py==2021.5.3
dask==2022.2.0
dataclasses @ file:///home/conda/feedstock_root/build_artifacts/dataclasses_1628958434797/work
datasets==2.1.0
datashader==0.14.0
datashape==0.5.2
datatable==1.0.0
datatile==1.0.0
datawig==0.2.0
deap==1.3.1
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1649586340600/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
defusedxml @ file:///home/conda/feedstock_root/build_artifacts/defusedxml_1615232257335/work
Delorean==1.0.0
deprecat==2.1.1
deprecation==2.1.0
descartes==1.1.0
dill==0.3.5.1
dipy==1.5.0
distlib==0.3.4
distributed==2022.2.0
dlib==19.24.0
dm-tree==0.1.7
docker @ file:///home/conda/feedstock_root/build_artifacts/docker-py_1638897274897/work
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.18.1
earthengine-api==0.1.315
easydev==0.12.0
easydict==1.9
easyocr==1.5.0
ecos==2.0.10
eli5==0.13.0
embedding-encoder==0.0.4
emoji==1.7.0
en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work
ephem==4.1.3
esda==2.4.1
essentia==2.1b6.dev778
explainable-ai-sdk @ file:///opt/conda/conda-bld/dlenv-tf-2-6-cpu_1653528867347/work/explainable_ai_sdk-1-py3-none-any.whl
explainers @ file:///opt/conda/conda-bld/dlenv-tf-2-6-cpu_1653528867347/work/explainers-1-cp37-cp37m-linux_x86_64.whl
fairscale==0.4.6
fastai==2.6.3
fastapi==0.78.0
fastavro==1.4.12
fastcore==1.4.4
fastdownload==0.0.6
fasteners==0.17.3
fastjsonschema @ file:///home/conda/feedstock_root/build_artifacts/python-fastjsonschema_1641751198313/work/dist
fastprogress==1.0.2
fasttext==0.9.2
fbpca==1.0
feather-format==0.4.1
featuretools==1.9.2
filelock==3.6.0
Fiona==1.8.21
fitter==1.4.0
flake8==4.0.1
flashtext==2.7
Flask==2.1.2
flatbuffers==1.12
flax==0.5.1
flit_core @ file:///home/conda/feedstock_root/build_artifacts/flit-core_1645629044586/work/source/flit_core
folium==0.12.1.post1
fonttools @ file:///home/conda/feedstock_root/build_artifacts/fonttools_1651017735934/work
frozendict==2.3.2
frozenlist @ file:///home/conda/feedstock_root/build_artifacts/frozenlist_1648771692657/work
fsspec @ file:///home/conda/feedstock_root/build_artifacts/fsspec_1653010523205/work
funcy==1.17
fury==0.8.0
future==0.18.2
fuzzywuzzy==0.18.0
gast==0.4.0
gatspy==0.3
gcsfs @ file:///home/conda/feedstock_root/build_artifacts/gcsfs_1653068494316/work
gensim==4.0.1
geographiclib==1.52
Geohash==1.0
geojson==2.5.0
geopandas==0.10.2
geoplot==0.5.1
geopy==2.2.0
geoviews==1.9.5
ggplot @ https://github.com/hbasria/ggpy/archive/0.11.5.zip
giddy==2.3.3
gitdb @ file:///home/conda/feedstock_root/build_artifacts/gitdb_1635085722655/work
GitPython @ file:///home/conda/feedstock_root/build_artifacts/gitpython_1645531658201/work
gluoncv==0.10.5.post0
gluonnlp==0.10.0
google-api-core==1.31.6
google-api-python-client @ file:///home/conda/feedstock_root/build_artifacts/google-api-python-client_1652842994887/work
google-apitools==0.5.31
google-auth==1.35.0
google-auth-httplib2 @ file:///home/conda/feedstock_root/build_artifacts/google-auth-httplib2_1617387471894/work
google-auth-oauthlib==0.4.6
google-cloud-aiplatform @ git+https://github.com/googleapis/python-aiplatform.git@4ed7a50fef58d694ddb29d4240965d44e383da2b
google-cloud-appengine-logging==1.1.1
google-cloud-audit-log==0.2.0
google-cloud-automl==1.0.1
google-cloud-bigquery==2.2.0
google-cloud-bigtable==2.9.0
google-cloud-core==1.7.2
google-cloud-dataproc==4.0.2
google-cloud-datastore==2.6.0
google-cloud-dlp==3.7.0
google-cloud-firestore==2.5.0
google-cloud-kms==2.11.1
google-cloud-language==2.4.2
google-cloud-logging==3.1.1
google-cloud-monitoring==2.9.1
google-cloud-pubsub==2.12.1
google-cloud-pubsublite==1.4.2
google-cloud-recommendations-ai==0.2.0
google-cloud-resource-manager==1.5.0
google-cloud-scheduler==2.6.3
google-cloud-spanner==3.14.0
google-cloud-speech==2.14.0
google-cloud-storage @ file:///home/conda/feedstock_root/build_artifacts/google-cloud-storage_1644876711050/work
google-cloud-tasks==2.9.0
google-cloud-translate==3.7.3
google-cloud-videointelligence==2.7.0
google-cloud-vision==2.7.2
google-crc32c @ file:///home/conda/feedstock_root/build_artifacts/google-crc32c_1651517221523/work
google-pasta==0.2.0
google-resumable-media==1.3.3
googleapis-common-protos @ file:///home/conda/feedstock_root/build_artifacts/googleapis-common-protos-feedstock_1652399823600/work
gplearn==0.4.2
gpxpy==1.5.0
graphviz==0.8.4
greenlet @ file:///home/conda/feedstock_root/build_artifacts/greenlet_1648882385539/work
grpc-google-iam-v1==0.12.4
grpcio==1.43.0
grpcio-gcp @ file:///home/conda/feedstock_root/build_artifacts/grpcio-gcp_1635875856259/work
grpcio-status==1.46.3
gviz-api==1.10.0
gym==0.24.1
gym-notices==0.0.7
h11==0.13.0
h2o==3.36.1.2
h5py==3.1.0
haversine==2.5.1
hdfs==2.7.0
HeapDict==1.0.1
hep-ml==0.7.1
hijri-converter==2.2.4
hmmlearn==0.2.7
holidays==0.14.2
holoviews==1.14.9
hpsklearn==0.1.0
html5lib==1.1
htmlmin==0.1.12
httplib2 @ file:///home/conda/feedstock_root/build_artifacts/httplib2_1644593570376/work
httplib2shim==0.0.3
httptools==0.4.0
huggingface-hub==0.7.0
humanize==4.1.0
hunspell==0.5.5
husl==4.0.3
hydra-slayer==0.4.0
hyperopt==0.2.7
hypertools==0.8.0
ibis-framework==2.1.1
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1642433548627/work
igraph==0.9.11
imagecodecs==2021.11.20
ImageHash @ file:///home/conda/feedstock_root/build_artifacts/imagehash_1626361020540/work
imageio==2.19.2
imbalanced-learn==0.9.0
imgaug==0.4.0
implicit @ file:///home/conda/feedstock_root/build_artifacts/implicit_1606198395798/work
importlib-metadata==4.11.4
importlib-resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1652715758048/work
inequality==1.0.0
iniconfig==1.1.1
ipydatawidgets==4.3.1.post1
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1649684273175/work/dist/ipykernel-6.13.0-py3-none-any.whl
ipyleaflet==0.16.0
ipympl==0.7.0
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1651240553635/work
ipython-genutils==0.2.0
ipython-sql @ file:///home/conda/feedstock_root/build_artifacts/ipython-sql_1636816912182/work
ipyvolume==0.5.2
ipyvue==1.7.0
ipyvuetify==1.8.2
ipywebrtc==0.6.0
ipywidgets==7.7.0
iso3166==2.0.2
isort==5.10.1
isoweek==1.3.3
itsdangerous==2.1.2
Janome==0.4.2
jax==0.3.13
jaxlib==0.3.10
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1649067102072/work
jeepney==0.8.0
jieba==0.42.1
Jinja2==3.1.2
jinja2-time @ file:///home/conda/feedstock_root/build_artifacts/jinja2-time_1646750632133/work
jmespath==1.0.0
joblib @ file:///home/conda/feedstock_root/build_artifacts/joblib_1633637554808/work
json5 @ file:///home/conda/feedstock_root/build_artifacts/json5_1600692310011/work
jsonlines==1.2.0
jsonnet==0.18.0
jsonschema @ file:///home/conda/feedstock_root/build_artifacts/jsonschema-meta_1651798819471/work
jupyter==1.0.0
jupyter-client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1652061014773/work
jupyter-console==6.4.3
jupyter-core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1652365252517/work
jupyter-http-over-ws==0.0.8
jupyter-lsp==1.5.1
jupyter-server @ file:///home/conda/feedstock_root/build_artifacts/jupyter_server_1651092495905/work
jupyter-server-mathjax @ file:///home/conda/feedstock_root/build_artifacts/jupyter-server-mathjax_1645541128695/work
jupyter-server-proxy @ file:///home/conda/feedstock_root/build_artifacts/jupyter-server-proxy_1643080298941/work
jupyterlab @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_1643984239174/work
jupyterlab-git @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab-git_1650975607360/work
jupyterlab-lsp==3.10.1
jupyterlab-pygments @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_pygments_1649936611996/work
jupyterlab-server @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_server_1641592475363/work
jupyterlab-widgets==1.1.0
jupytext @ file:///home/conda/feedstock_root/build_artifacts/jupytext_1649224989735/work
kaggle==1.5.12
kaggle-environments==1.9.10
keras==2.9.0
Keras-Preprocessing==1.1.2
keras-tuner==1.1.2
keyring==23.5.1
keyrings.google-artifactregistry-auth==1.0.0
kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1648854392523/work
kmapper==2.0.1
kmodes==0.12.1
korean-lunar-calendar==0.2.1
kornia==0.5.8
kt-legacy==1.0.4
kubernetes @ file:///home/conda/feedstock_root/build_artifacts/python-kubernetes_1652020343043/work
langcodes==3.3.0
langid==1.1.6
lazy-object-proxy==1.7.1
learntools @ git+https://github.com/Kaggle/learntools@7373faf5cfb8ca83046e78bac09314d90fbf2474
leven==1.0.4
libclang==14.0.1
libpysal==4.6.2
librosa==0.9.1
lightfm==1.16
lightgbm==3.3.2
lime==0.2.0.1
line-profiler==3.5.1
llvmlite==0.38.1
lmdb==1.3.0
lml==0.1.0
locket==1.0.0
LunarCalendar==0.0.9
lxml==4.9.0
Mako==1.2.0
mapclassify==2.4.3
marisa-trie==0.7.7
Markdown @ file:///home/conda/feedstock_root/build_artifacts/markdown_1651821407140/work
markdown-it-py @ file:///home/conda/feedstock_root/build_artifacts/markdown-it-py_1650305363826/work
markovify==0.9.4
MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1635833550185/work
matplotlib==3.5.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1631080358261/work
matplotlib-venn==0.11.7
matrixprofile @ git+https://github.com/matrix-profile-foundation/matrixprofile.git@6bea7d4445284dbd9700a097974ef6d4613fbca7
mccabe==0.6.1
mdit-py-plugins @ file:///home/conda/feedstock_root/build_artifacts/mdit-py-plugins_1639763187273/work
mdurl @ file:///home/conda/feedstock_root/build_artifacts/mdurl_1639515908913/work
memory-profiler==0.60.0
mercantile==1.2.1
mgwr==2.1.2
missingno==0.4.2
mistune @ file:///home/conda/feedstock_root/build_artifacts/mistune_1635844677043/work
mizani==0.7.3
mlcrate==0.2.0
mlens==0.2.3
mlxtend==0.20.0
mmh3==3.0.0
mne==1.0.3
mnist==0.2.2
mock==4.0.3
momepy==0.5.3
more-itertools==8.13.0
mpld3==0.5.8
mpmath==1.2.1
msgpack==1.0.4
msgpack-numpy==0.4.8
multidict @ file:///home/conda/feedstock_root/build_artifacts/multidict_1648882415996/work
multimethod @ file:///home/conda/feedstock_root/build_artifacts/multimethod_1603129052241/work
multipledispatch==0.6.0
multiprocess==0.70.13
munch==2.5.0
munkres==1.1.4
murmurhash==1.0.7
mxnet==1.4.0
mypy-extensions @ file:///home/conda/feedstock_root/build_artifacts/mypy_extensions_1649013329265/work
nb-conda @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_1611345550379/work
nb-conda-kernels @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_kernels_1636999991206/work
nbclassic @ file:///home/conda/feedstock_root/build_artifacts/nbclassic_1647450696711/work
nbclient @ file:///home/conda/feedstock_root/build_artifacts/nbclient_1646999386773/work
nbconvert @ file:///home/conda/feedstock_root/build_artifacts/nbconvert-meta_1648822144012/work
nbdime @ file:///home/conda/feedstock_root/build_artifacts/nbdime_1635269257164/work
nbformat @ file:///home/conda/feedstock_root/build_artifacts/nbformat_1651607001005/work
nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1648959695634/work
netCDF4==1.5.8
networkx @ file:///home/conda/feedstock_root/build_artifacts/networkx_1598210780226/work
nibabel==3.2.2
nilearn==0.9.1
nltk==3.2.4
nnabla==1.28.0
nose==1.3.7
notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1650363291341/work
notebook-executor @ file:///opt/conda/conda-bld/dlenv-base_1653527147777/work/packages/notebook_executor
notebook-shim @ file:///home/conda/feedstock_root/build_artifacts/notebook-shim_1646330736330/work
numba @ file:///home/conda/feedstock_root/build_artifacts/numba_1652226558760/work
numexpr==2.8.1
numpy==1.14.6
oauth2client==4.1.3
oauthlib @ file:///home/conda/feedstock_root/build_artifacts/oauthlib_1643507977997/work
odfpy==1.4.1
olefile==0.46
onnx==1.11.0
opencv-contrib-python==4.5.4.60
opencv-python==4.5.4.60
opencv-python-headless==4.5.4.60
openslide-python==1.1.2
opt-einsum==3.3.0
optax==0.1.2
optuna==2.10.1
orderedmultidict==1.0.1
orjson==3.6.8
ortools==9.3.10497
osmnx==1.1.1
overrides==6.1.0
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1637239678211/work
palettable==3.3.0
pandarallel==1.6.1
pandas==0.25.3
pandas-datareader==0.10.0
pandas-profiling==2.4.0
pandas-summary==0.2.0
pandasql==0.7.3
pandocfilters @ file:///home/conda/feedstock_root/build_artifacts/pandocfilters_1631603243851/work
panel==0.13.1
papermill @ file:///home/conda/feedstock_root/build_artifacts/papermill_1642949624634/work
param==1.12.1
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
parsy==1.4.0
partd==1.2.0
path==16.4.0
path.py==12.5.0
pathos==0.2.9
pathspec @ file:///home/conda/feedstock_root/build_artifacts/pathspec_1626613672358/work
pathtools==0.1.2
pathy==0.6.1
patsy @ file:///home/conda/feedstock_root/build_artifacts/patsy_1632667180946/work
pbr==5.9.0
pdf2image==1.16.0
PDPbox @ git+https://github.com/SauceCat/PDPbox@b022a0aabcc6dbe2440244bf48d08fbb6ecdaf2d
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1602535608087/work
phik @ file:///home/conda/feedstock_root/build_artifacts/phik_1647910144007/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1652814980128/work
plac==1.1.3
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1645298319244/work
plotly==5.8.2
plotly-express==0.4.1
plotnine==0.8.0
pluggy==1.0.0
pointpats==2.2.0
polyglot==16.7.4
pooch==1.6.0
portalocker==2.4.0
pox==0.3.1
poyo==0.5.0
ppca==0.0.4
ppft==1.7.6.5
preprocessing==0.1.13
preshed==3.0.6
prettytable @ file:///home/conda/feedstock_root/build_artifacts/prettytable_1651787307815/work
progressbar2==4.0.0
prometheus-client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1649447152425/work
promise==2.3
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1649130487073/work
pronouncing==0.2.0
prophet==1.0.1
proto-plus==1.20.4
protobuf==3.19.4
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1653089169272/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pudb==2022.1.1
PuLP==2.6.0
py==1.11.0
py-lz4framed==0.14.0
py-stringmatching==0.4.2
py-stringsimjoin==0.3.2
py4j==0.10.9.5
pyaml==21.10.1
PyArabic==0.6.14
pyarrow==8.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.7
PyAstronomy==0.17.1
pybind11==2.9.2
pycodestyle==2.8.0
pycosat==0.6.3
pycountry==22.3.5
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
pycrypto==2.6.1
pyct==0.4.8
pydantic==1.8.2
pydash==5.1.0
pydegensac==0.1.2
pyDeprecate==0.3.2
pydicom==2.3.0
pydocstyle==6.1.1
pydot==1.4.2
pydub==0.25.1
pyemd==0.5.1
pyerfa @ file:///home/conda/feedstock_root/build_artifacts/pyerfa_1649586111662/work
pyexcel-io==0.6.6
pyexcel-ods==0.6.0
pyfasttext==0.4.6
pyflakes==2.4.0
pygeos==0.12.0
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1650904496387/work
PyJWT @ file:///home/conda/feedstock_root/build_artifacts/pyjwt_1652398519695/work
pykalman==0.9.5
pyLDAvis==3.2.2
pylint==2.14.2
pymc3==3.11.5
PyMeeus==0.5.11
pymongo==3.12.3
Pympler==1.0.1
pynndescent==0.5.7
pyocr==0.8.2
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1643496850550/work
pyparsing @ file:///home/conda/feedstock_root/build_artifacts/pyparsing_1652235407899/work
pyPdf==1.13
pyperclip==1.8.2
PyPrind==2.11.3
pyproj @ file:///home/conda/feedstock_root/build_artifacts/pyproj_1623801868210/work
pyrsistent @ file:///home/conda/feedstock_root/build_artifacts/pyrsistent_1649013358450/work
pysal==2.6.0
pyshp @ file:///home/conda/feedstock_root/build_artifacts/pyshp_1651509119669/work
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
pystan==2.19.1.1
pytesseract==0.3.9
pytest==7.1.2
python-bidi==0.4.2
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
python-dotenv==0.20.0
python-igraph==0.9.11
python-Levenshtein==0.12.2
python-louvain==0.16
python-lsp-jsonrpc==1.0.0
python-lsp-server==1.4.1
python-slugify @ file:///home/conda/feedstock_root/build_artifacts/python-slugify_1651150815876/work
python-utils==3.3.3
pythreejs==2.3.0
pytorch-ignite==0.4.9
pytorch-lightning==1.6.4
pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1647961439546/work
pytz-deprecation-shim==0.1.0.post0
pyu2f @ file:///home/conda/feedstock_root/build_artifacts/pyu2f_1604248910016/work
PyUpSet==0.1.1.post7
pyviz-comms==2.2.0
PyWavelets @ file:///home/conda/feedstock_root/build_artifacts/pywavelets_1649616401885/work
PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1648757092905/work
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1652965483789/work
qgrid==1.3.1
qtconsole==5.3.0
QtPy==2.1.0
quantecon==0.5.3
quantities==0.13.0
qudida==0.0.4
quilt3==5.0.0
randomgen==1.21.2
rasterio==1.2.10
rasterstats==0.16.0
ray==1.13.0
regex==2021.11.10
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1641580202195/work
requests-futures==1.0.0
requests-oauthlib @ file:///home/conda/feedstock_root/build_artifacts/requests-oauthlib_1643557462909/work
resampy==0.2.2
responses==0.18.0
retrying==1.3.3
rgf-python==3.12.0
rich==12.4.4
rope==1.1.1
rsa @ file:///home/conda/feedstock_root/build_artifacts/rsa_1637781155505/work
Rtree==1.0.0
ruamel-yaml-conda @ file:///tmp/build/80754af9/ruamel_yaml_1616016701961/work
rvlib==0.0.6
s2sphere==0.2.5
s3fs==2022.5.0
s3transfer==0.6.0
sacremoses==0.0.53
scattertext==0.1.6
scikit-image==0.18.3
scikit-learn==0.22.1
scikit-learn-intelex==2021.5.3
scikit-multilearn==0.2.0
scikit-optimize==0.9.0
scikit-plot==0.3.7
scikit-surprise==1.1.1
scipy==1.5.4
seaborn @ file:///home/conda/feedstock_root/build_artifacts/seaborn-split_1629095986539/work
SecretStorage==3.3.2
segregation==2.2.3
semver==2.13.0
Send2Trash @ file:///home/conda/feedstock_root/build_artifacts/send2trash_1628511208346/work
sentencepiece==0.1.96
sentry-sdk==1.5.12
setproctitle==1.2.3
setuptools-git==1.2
shap==0.41.0
Shapely @ file:///home/conda/feedstock_root/build_artifacts/shapely_1635194349843/work
shortuuid==1.0.9
simpervisor @ file:///home/conda/feedstock_root/build_artifacts/simpervisor_1609865618711/work
SimpleITK==2.1.1.2
simplejson==3.17.6
six @ file:///tmp/build/80754af9/six_1623709665295/work
sklearn==0.0
sklearn-contrib-py-earth @ git+https://github.com/scikit-learn-contrib/py-earth.git@dde5f899255411a7b9cbbabf93a817eff4b02e5e
sklearn-pandas==2.2.0
slicer==0.0.7
smart-open==5.2.1
smhasher==0.150.1
smmap @ file:///home/conda/feedstock_root/build_artifacts/smmap_1611376390914/work
sniffio @ file:///home/conda/feedstock_root/build_artifacts/sniffio_1648819180181/work
snowballstemmer==2.2.0
snuggs==1.4.7
sortedcontainers==2.4.0
SoundFile==0.10.3.post1
soupsieve @ file:///home/conda/feedstock_root/build_artifacts/soupsieve_1638550740809/work
spacy==2.3.7
spacy-legacy==3.0.9
spacy-loggers==1.0.2
spaghetti==1.6.5
spectral==0.22.4
spglm==1.0.8
sphinx-rtd-theme==0.2.4
spint==1.0.7
splot==1.1.5.post1
spopt==0.4.1
spreg==1.2.4
spvcm==0.3.0
SQLAlchemy @ file:///home/conda/feedstock_root/build_artifacts/sqlalchemy_1651017966921/work
sqlparse @ file:///home/conda/feedstock_root/build_artifacts/sqlparse_1631317292236/work
squarify==0.4.3
srsly==1.0.5
starlette==0.19.1
statsmodels @ file:///home/conda/feedstock_root/build_artifacts/statsmodels_1644535599043/work
stemming==1.0.1
stevedore==3.5.0
stop-words==2018.7.23
stopit==1.1.2
stumpy==1.11.1
sympy==1.10.1
tabulate==0.8.9
tangled-up-in-unicode @ file:///home/conda/feedstock_root/build_artifacts/tangled-up-in-unicode_1632832610704/work
tbb==2021.6.0
tblib==1.7.0
tenacity @ file:///home/conda/feedstock_root/build_artifacts/tenacity_1626090218611/work
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-profile==2.4.0
tensorboard-plugin-wit==1.8.1
tensorboardX==2.5.1
tensorflow==2.9.1
tensorflow-addons==0.14.0
tensorflow-cloud==0.1.14
tensorflow-datasets==4.3.0
tensorflow-estimator==2.9.0
tensorflow-gcs-config==2.6.0
tensorflow-hub==0.12.0
tensorflow-io==0.21.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-metadata==1.8.0
tensorflow-probability==0.14.1
tensorflow-serving-api==2.8.0
tensorflow-transform==1.8.0
tensorpack==0.11
termcolor==1.1.0
terminado @ file:///home/conda/feedstock_root/build_artifacts/terminado_1652790603075/work
testpath @ file:///home/conda/feedstock_root/build_artifacts/testpath_1645693042223/work
text-unidecode==1.3
textblob==0.17.1
texttable==1.6.4
textwrap3==0.9.2
tfx-bsl==1.8.0
Theano==1.0.5
Theano-PyMC==1.1.2
thinc==7.4.5
threadpoolctl @ file:///home/conda/feedstock_root/build_artifacts/threadpoolctl_1643647933166/work
tifffile==2021.11.2
tinycss2 @ file:///home/conda/feedstock_root/build_artifacts/tinycss2_1637612658783/work
tobler==0.9.0
tokenizers==0.12.1
toml @ file:///home/conda/feedstock_root/build_artifacts/toml_1604308577558/work
tomli @ file:///home/conda/feedstock_root/build_artifacts/tomli_1644342247877/work
tomlkit==0.11.0
toolz==0.11.2
torch==1.11.0+cpu
torchaudio==0.11.0+cpu
torchmetrics==0.9.1
torchtext==0.12.0
torchvision==0.12.0+cpu
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1648827244717/work
TPOT==0.11.7
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1649051611147/work
traceml==1.0.0
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1652735690480/work
traittypes==0.2.1
transformers==4.18.0
trueskill==0.4.5
tsfresh==0.19.0
typed-ast @ file:///home/conda/feedstock_root/build_artifacts/typed-ast_1653226021340/work
typeguard==2.13.3
typer==0.4.1
typing==3.6.6
typing-utils==0.1.0
typing_extensions==4.1.1
tzdata==2022.1
tzlocal==4.2
ujson @ file:///home/conda/feedstock_root/build_artifacts/ujson_1653057311506/work
umap-learn==0.5.3
unicodedata2 @ file:///home/conda/feedstock_root/build_artifacts/unicodedata2_1649111917568/work
Unidecode @ file:///home/conda/feedstock_root/build_artifacts/unidecode_1646918762405/work
update-checker==0.18.0
uritemplate==3.0.1
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1647489083693/work
urwid==2.1.2
urwid-readline==0.13
uvicorn==0.17.6
uvloop==0.16.0
vaex==4.9.2
vaex-astro==0.9.1
vaex-core==4.9.2
vaex-hdf5==0.12.2
vaex-jupyter==0.8.0
vaex-ml==0.17.0
vaex-server==0.8.1
vaex-viz==0.5.2
vecstack==0.4.0
virtualenv==20.14.1
visions @ file:///home/conda/feedstock_root/build_artifacts/visions_1638743854326/work
vowpalwabbit==9.1.0
vtk==9.1.0
Wand==0.6.7
wandb==0.12.18
wasabi==0.9.1
watchgod==0.8.2
wavio==0.0.4
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1600965781394/work
webencodings==0.5.1
websocket-client @ file:///home/conda/feedstock_root/build_artifacts/websocket-client_1648562593984/work
websockets==10.3
Werkzeug==2.1.2
wfdb==3.4.1
widgetsnbextension==3.6.0
witwidget==1.8.0
woodwork==0.16.3
Wordbatch==1.4.9
wordcloud==1.8.1
wordsegment==1.3.1
wrapt @ file:///home/conda/feedstock_root/build_artifacts/wrapt_1651495229974/work
wslink==1.6.5
xai-tabular-widget @ file:///opt/conda/conda-bld/dlenv-tf-2-6-cpu_1653528867347/work/xai_tabular_widget-1-py3-none-any.whl
xarray==0.20.2
xarray-einstats==0.2.2
xgboost==1.6.1
xvfbwrapper==0.2.9
xxhash==3.0.0
xyzservices==2022.4.0
yacs==0.1.8
yapf==0.32.0
yarl @ file:///home/conda/feedstock_root/build_artifacts/yarl_1648966511831/work
yellowbrick==1.4
zict==2.2.0
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1649012893348/work

The dependecies installation

!pip install datawig
!pip install embedding-encoder[full]

The dependencies module import

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.compose import ColumnTransformer

import pandas as pd
import numpy as np

from embedding_encoder import EmbeddingEncoder
from embedding_encoder.utils.compose import ColumnTransformerWithNames

from datawig import Imputer, SimpleImputer

The function and variable declaration

def relabled_df (df):
    true_label = {
        "cap-shape":{
            'b':"bell",
            'c':'conical',
            'x':'convex',
            'f':'flat',
            'k':'knobbed',
            's':'sunken'
        },
        "cap-surface":{
            'f':'fibrous',
            'g':'grooves',
            'y':'scaly',
            's':'smooth'
        },
        "cap-color":{
            'n':'brown',
            'b':'buff',
            'c': 'cinnamon',
            'g': 'gray',
            'r': 'green',
            'p': 'pink',
            'u': 'purple',
            'e': 'red',
            'w': 'white',
            'y': 'yellow'
        },
        'bruises':{
            't': 'yes',
            'f': 'no'
        },
        'odor':{
            'a': 'almond',
            'l': 'anise',
            'c': 'creosote',
            'y': 'fishy',
            'f': 'foul', 
            'm': 'musty',
            'n': 'none',
            'p': 'pungent',
            's': 'spicy'
        },
        'gill-attachment':{
            'a': 'attached',
            'd': 'descending',
            'f': 'free',
            'n': 'notched'
        },
        'gill-spacing':{
            'c': 'close',
            'w': 'crowded',
            'd': 'distant'
        },
        'gill-size':{
            'b': 'broad',
            'n': 'narrow'
        },
        'gill-color':{
            'k': 'black',
            'n': 'brown',
            'b': 'buff',
            'h': 'chocolate',
            'g': 'gray',
            'r': 'green',
            'o': 'orange',
            'p': 'pink',
            'u': 'purple',
            'e': 'red',
            'w': 'white',
            'y': 'yellow'
        },
        'stalk-shape':{
            'e': 'enlarging',
            't': 'tapering'
        },
        'stalk-root':{
            'b': 'bulbous',
            'c': 'club',
            'u': 'cup',
            'e': 'equal',
            'z': 'rhizomorphs',
            'r': 'rooted',
            '?': 'missing'
        },
        'stalk-surface-above-ring':{
            'f': 'fibrous',
            'y': 'scaly',
            'k': 'silky',
            's': 'smooth'
        },
        'stalk-surface-below-ring':{
            'f': 'fibrous',
            'y': 'scaly',
            'k': 'silky',
            's': 'smooth'
        },
        'stalk-color-above-ring':{
             'n':'brown',
            'b':'buff',
            'c': 'cinnamon',
            'g': 'gray',
            'o': 'orange',
            'p': 'pink',
            'e': 'red',
            'w': 'white',
            'y': 'yellow'
        },
         'stalk-color-below-ring':{
            'n':'brown',
            'b':'buff',
            'c': 'cinnamon',
            'g': 'gray',
            'o': 'orange',
            'p': 'pink',
            'e': 'red',
            'w': 'white',
            'y': 'yellow'
        },
        'veil-type':{
            'p': 'partial',
            'u': 'universal'
        },
        'veil-color':{
            'n': 'brown',
            'o': 'orange',
            'w': 'white',
            'y': 'yellow'
        },
        'ring-number':{
            'n': 'none',
            'o': 'one',
            't': 'two'
        },
        'ring-type':{
            'c': 'cobwebby',
            'e': 'evanescent',
            'f': 'flaring',
            'l': 'large',
            'n': 'none',
            'p': 'pendant',
            's': 'sheating',
            'z': 'zone'
        },
        'spore-print-color':{
            'k': 'black',
            'n': 'brown',
            'b': 'buff',
            'h': 'chocolate',
            'r': 'green',
            'o': 'orange',
            'u': 'purple',
            'w': 'white',
            'y': 'yellow'
        },
        'population':{
            'a': 'abundant',
            'c': 'clustered',
            'n': 'numerous',
            's': 'scattered',
            'v': 'several',
            'y': 'solitary'
        },
        'habitat':{
            'g': 'grasses',
            'l': 'leaves',
            'm': 'meadows',
            'p': 'paths', 
            'u': 'urban',
            'w': 'waste',
            'd': 'woods'
        },
        'class':{
            'e': 'edible',
            'p': 'poisonous'
        }
    }
    a = df.copy()
    for x in a.columns:
        a[x] = a[x].replace(true_label[x])

    return a
#     print((df.columns).isin(true_label.keys()))

classes,bruises, ring_number = {'e':1, 'p':0}, {'no':0, 'yes':1},{'none':0, 'one':1, 'two':2}
order_int_list = {'bruises': bruises,'ring-number': ring_number,
                 'class':classes}

def Numeric_Scaler (df: pd.DataFrame, scaler : StandardScaler,numeric_list_name: list):
    data = df.copy()
    fitted = ColumnTransformer(transformers=[("numerical_scale", scaler, numeric_list_name)], 
                           remainder='passthrough')

    ordered_non_numeric = [x for x in df.columns if x not in numeric_list_name]

    transformedDf = pd.DataFrame(fitted.fit_transform(data), columns = numeric_list_name+ordered_non_numeric)
    return transformedDf

def Feature_Target_Format(df: pd.DataFrame, valY: pd.Series):
    newD = df.copy()
    # Relabel the data
    newX = relabled_df(newD)

    # Imputer the 'Special Missing' column from pretrained neural network imputer
    Imputer_Real = Imputer.load("../input/dl-long-mush-stalkroot-na/missing_mushroom")
    missing_pred = Imputer_Real.predict(newX) # Work on kaggle
#     missing_pred = Imputer_Real.transform(newX)
#     missing_pred = pd.DataFrame(missing_pred['stalk-root'], 
#                                 columns = ['stalk-root_imputed']).set_index(keys= newX.index) # Work on colab

    newX = newX.join(missing_pred["stalk-root_imputed"])
    newX["stalk-root"] = np.where(newX["stalk-root"]=='missing', 
                                      newX["stalk-root_imputed"], 
                                      newX["stalk-root"])
    newX.drop(columns='stalk-root_imputed', inplace=True)
#     print(newX.info())

    # Format 'two special' column into numeric and boolean 
    newX[['ring-number','bruises']] = newX[['ring-number','bruises']].replace(order_int_list)
#     print(newX.info())
    # Scale the numeric data
    newX = Numeric_Scaler(newX, MinMaxScaler(), ['ring-number'])
#     print(newX.info())
    newX[['ring-number']] = newX[['ring-number']].astype('int64')
    newX[['bruises']] = newX[['bruises']].astype('int64')
#     print(newX.info())

    # Format Y_value into boolean
    newY = pd.DataFrame(valY.copy()).replace(order_int_list)['class']
#     newY = valY.copy()

    # Embedding Encoder of string Categorical Data using pretrained neural network embedding-encoder
    category_cols = list(newX.columns[(newX.dtypes=='object').values==True])
    Embed_load = EmbeddingEncoder(task = 'classification', 
                            pretrained = True, 
                            mapping_path='../input/tf-embed-feature-categorical-mushroom-data/Embed_TF_Mushromm_Categorical_Data.json')
#     print(newX.info())
#     print(Embed_load)
    Embed_load.fit(newX[category_cols], newY)

    newN = Embed_load.transform(newX[category_cols])
    newX = pd.concat([newX[['ring-number','bruises']], newN], axis=1)

    return newX,newY
rxavier commented 2 years ago

Right there it says that scikit-learn==0.22.1 which didn't have a _validate_data() in BaseEstimator. Try upgrading to a newer version. For reference, EE was built using 1.0.2.

KyleeValencia commented 2 years ago

Okay already change it and it work