NVIDIA-Merlin / core

Core Utilities for NVIDIA Merlin
Apache License 2.0
19 stars 14 forks source link

Convert to `cudf.Series` in `create_multihot_col` #187

Closed oliverholworthy closed 1 year ago

oliverholworthy commented 1 year ago
nvidia-merlin-bot commented 1 year ago
Click to view CI Results
GitHub pull request #187 of commit 855ed86dcb78b3610c60dbaf1590198d53038f3d, no merge conflicts.
Running as SYSTEM
Setting status of 855ed86dcb78b3610c60dbaf1590198d53038f3d to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/353/ and message: 'Pending'
Using context: Jenkins
Building on the built-in node in workspace /var/jenkins_home/jobs/merlin_core/workspace
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/187/*:refs/remotes/origin/pr/187/* # timeout=10
 > git rev-parse 855ed86dcb78b3610c60dbaf1590198d53038f3d^{commit} # timeout=10
Checking out Revision 855ed86dcb78b3610c60dbaf1590198d53038f3d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 855ed86dcb78b3610c60dbaf1590198d53038f3d # timeout=10
Commit message: "Convert to cudf series from create_multihot_col"
 > git rev-list --no-walk 725484f7d75f9ecd0b4829cb68171abfeff02bec # timeout=10
[workspace] $ /bin/bash /tmp/jenkins6463472059335043672.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.9.0+19.g855ed86.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.30,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.30,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.9.0+19.g855ed86,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1152951098'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, cov-4.0.0, xdist-3.1.0
collected 398 items / 1 skipped

tests/unit/core/test_dispatch.py ....                                    [  1%]
tests/unit/core/test_protocols.py .........                              [  3%]
tests/unit/core/test_version.py .                                        [  3%]
tests/unit/dag/test_base_operator.py ....                                [  4%]
tests/unit/dag/test_column_selector.py ..............................    [ 12%]
tests/unit/dag/test_dictarray.py ...                                     [ 12%]
tests/unit/dag/test_executors.py ..                                      [ 13%]
tests/unit/dag/test_graph.py ....                                        [ 14%]
tests/unit/dag/ops/test_selection.py ....                                [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................         [ 43%]
tests/unit/schema/test_column_schemas.py ............................... [ 51%]
                                                                         [ 51%]
tests/unit/schema/test_schema.py .............                           [ 54%]
tests/unit/schema/test_schema_io.py .................................... [ 63%]
........................................................................ [ 81%]
...........................................................              [ 96%]
tests/unit/schema/test_tags.py .......                                   [ 97%]
tests/unit/utils/test_utils.py ........                                  [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
    warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
    warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:579: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
    paths = [p.path for p in pa_dataset.pieces]

tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
  /var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:245: FutureWarning: `gather_statistics` is now deprecated and will be ignored.
    warnings.warn(

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
  /var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [, ].
    warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
  /var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [, ].
    warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
  /var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [, ].
    warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
  /var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [, ].
    warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
  /var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 34731 instead
    warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
  /var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 38825 instead
    warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
  /var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 43869 instead
    warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
  /var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 40589 instead
    warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
  /var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 39305 instead
    warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
  /var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 36247 instead
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name                                      Stmts   Miss  Cover
-------------------------------------------------------------
merlin/core/__init__.py                       2      0   100%
merlin/core/_version.py                     354    205    42%
merlin/core/compat.py                        10      4    60%
merlin/core/dispatch.py                     366    219    40%
merlin/core/protocols.py                     99     45    55%
merlin/core/utils.py                        197     56    72%
merlin/dag/__init__.py                        5      0   100%
merlin/dag/base_operator.py                 121     20    83%
merlin/dag/dictarray.py                      55     15    73%
merlin/dag/executors.py                     141     68    52%
merlin/dag/graph.py                          99     35    65%
merlin/dag/node.py                          344    161    53%
merlin/dag/ops/__init__.py                    4      0   100%
merlin/dag/ops/concat_columns.py             17      4    76%
merlin/dag/ops/selection.py                  22      0   100%
merlin/dag/ops/subset_columns.py             12      4    67%
merlin/dag/ops/subtraction.py                21     11    48%
merlin/dag/selector.py                      101      6    94%
merlin/io/__init__.py                         4      0   100%
merlin/io/avro.py                            88     88     0%
merlin/io/csv.py                             57      6    89%
merlin/io/dask.py                           181     53    71%
merlin/io/dataframe_engine.py                61      5    92%
merlin/io/dataframe_iter.py                  21      1    95%
merlin/io/dataset.py                        347     54    84%
merlin/io/dataset_engine.py                  37      8    78%
merlin/io/fsspec_utils.py                   127    108    15%
merlin/io/hugectr.py                         45     35    22%
merlin/io/parquet.py                        624     70    89%
merlin/io/shuffle.py                         38     12    68%
merlin/io/worker.py                          80     66    18%
merlin/io/writer.py                         190     52    73%
merlin/io/writer_factory.py                  18      4    78%
merlin/schema/__init__.py                     2      0   100%
merlin/schema/io/__init__.py                  0      0   100%
merlin/schema/io/proto_utils.py              20      4    80%
merlin/schema/io/schema_bp.py               306      5    98%
merlin/schema/io/tensorflow_metadata.py     190     17    91%
merlin/schema/schema.py                     229     31    86%
merlin/schema/tags.py                        82      1    99%
-------------------------------------------------------------
TOTAL                                      4717   1473    69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 398 passed, 1 skipped, 29 warnings in 70.99s (0:01:10) ============
___________________________________ summary ____________________________________
  test-gpu: commands succeeded
  congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[workspace] $ /bin/bash /tmp/jenkins8544453252227183398.sh
github-actions[bot] commented 1 year ago

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-187