dask / dask-kubernetes

Native Kubernetes integration for Dask
https://kubernetes.dask.org
BSD 3-Clause "New" or "Revised" License
311 stars 148 forks source link

XGBoost error when using HelmCluster #303

Closed Chris-hughes10 closed 3 years ago

Chris-hughes10 commented 3 years ago

What happened: I was following a tutorial available at https://coiled.io/blog/xgboost-frictionless-training/ but keep hitting an error when creating a xgb.dask.DaskDMatrix when using a HelmCluster cluster - there are no issues when running the same code using LocalCluster. The cluster is generally working, I can manually scale, and can track tasks using the dashboard.

the error I am experiencing is:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<timed exec> in <module>

/anaconda/envs/vaex-env/lib/python3.8/site-packages/xgboost/dask.py in __init__(self, client, data, label, missing, weight, base_margin, label_lower_bound, label_upper_bound, feature_names, feature_types)
    225         self.is_quantile = False
    226 
--> 227         self._init = client.sync(self.map_local_data,
    228                                  client, data, label=label, weights=weight,
    229                                  base_margin=base_margin,

/anaconda/envs/vaex-env/lib/python3.8/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    834             return future
    835         else:
--> 836             return sync(
    837                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    838             )

/anaconda/envs/vaex-env/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    338     if error[0]:
    339         typ, exc, tb = error[0]
--> 340         raise exc.with_traceback(tb)
    341     else:
    342         return result[0]

/anaconda/envs/vaex-env/lib/python3.8/site-packages/distributed/utils.py in f()
    322             if callback_timeout is not None:
    323                 future = asyncio.wait_for(future, callback_timeout)
--> 324             result[0] = yield future
    325         except Exception as exc:
    326             error[0] = sys.exc_info()

/anaconda/envs/vaex-env/lib/python3.8/site-packages/tornado/gen.py in run(self)
    760 
    761                     try:
--> 762                         value = future.result()
    763                     except Exception:
    764                         exc_info = sys.exc_info()

/anaconda/envs/vaex-env/lib/python3.8/site-packages/xgboost/dask.py in map_local_data(self, client, data, label, weights, base_margin, label_lower_bound, label_upper_bound)
    311 
    312         for part in parts:
--> 313             assert part.status == 'finished'
    314 
    315         # Preserving the partition order for prediction.

AssertionError: 

I'm not sure if this is due to XGBoost or dask-kubernetes, I decided to post here as it works fine locally.

What you expected to happen:

I would expect the same behaviour as when running on the local cluster.

Minimal Complete Verifiable Example:

from dask.distributed import Client, LocalCluster
from dask_kubernetes import HelmCluster
from dask_ml.model_selection import train_test_split
from dask_ml.preprocessing import Categorizer
import xgboost as xgb

# cluster = LocalCluster() # works fine

cluster = HelmCluster(release_name='dask',
                     port_forward_cluster_ip = True)

# Connect to the cluster
client = Client(cluster)
# Load the example dataset sample - specify columns
columns = [
    "interest_rate", "loan_age", "num_borrowers", 
    "borrower_credit_score", "num_units"
]
categorical = [
    "orig_channel", "occupancy_status", "property_state",
    "first_home_buyer", "loan_purpose", "property_type",
    "zip", "relocation_mortgage_indicator", "delinquency_12"
]

# Download data from S3
mortgage_data = dd.read_parquet(
    "s3://coiled-data/mortgage-2000.parq/*", 
    compression="gzip",
    columns=columns + categorical, 
    storage_options={"anon": True}
)

# Cache the data on Cluster workers
mortgage_data = mortgage_data.persist()

# Cast categorical columns to the correct type

ce = Categorizer(columns=categorical)
mortgage_data = ce.fit_transform(mortgage_data)
for col in categorical:
    mortgage_data[col] = mortgage_data[col].cat.codes

# Create the train-test split

X, y = mortgage_data.iloc[:, :-1], mortgage_data["delinquency_12"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.1, shuffle=True, random_state=2
)

# Tried with and without this
# X_train = X_train.persist()
# y_train = y_train.persist()

# Create the XGBoost DMatrix
# Fails here
dtrain = xgb.dask.DaskDMatrix(client, X_train, y_train)    

Anything else we need to know?:

Environment:

jacobtomlinson commented 3 years ago

Thanks for opening this @Chris-hughes10.

My initial thought is that you do not have xgboost installed on the cluster if you are using the default daskdev/dask:2021.1.0.

You should always ensure your remote Python environment matches your local one.

jacobtomlinson commented 3 years ago

The error you shared also seems to be coming directly from xgboost. Have you raised an issue there?

The problem here could well be a Dask issue, but we may need better error handling in xgboost to identify what it is.

Chris-hughes10 commented 3 years ago

Hi @jacobtomlinson , I can confirm that I did install XGBoost on the cluster and the scheduler, by adding additional pip packages as chart values, and the environments are consistent.

I was unsure where to raise the issue to be honest, as it could be one of several components that is the root cause. I raised it here as it worked using LocalCluster but I am happy to raise elsewhere.

Doing further experimentation this morning, I have found that the issue does not occur, and training completes successfully, when using dask 2.30.0 , distributed 2.30.1 . So perhaps it is better to raise this as an issue to Dask

jacobtomlinson commented 3 years ago

Ah fair enough!

Could you share your full helm config so that I can try and reproduce it?

Are you able to reproduce the error with LocalCluster using more recent Dask versions?

Chris-hughes10 commented 3 years ago

Sure thing. I am using the latest version of the dask chart from helm, and installing it with: helm install dask dask/dask --set worker.env[0].name=EXTRA_APT_PACKAGES,worker.env[0].value='gcc' --set worker.env[1].name=EXTRA_PIP_PACKAGES,worker.env[1].value='dask-ml numpy==1.19.2 fastparquet pyarrow adlfs xgboost s3fs scikit-learn --upgrade' --set scheduler.env[0].name=EXTRA_PIP_PACKAGES,scheduler.env[0].value='xgboost scikit-learn --upgrade'

I wasn't able to reproduce the error locally, the training completed successfully using the local machine.

jacobtomlinson commented 3 years ago

Which dask-kubernetes version are you using?

Chris-hughes10 commented 3 years ago

I am using dask-kubernetes==0.11.0

jacobtomlinson commented 3 years ago

Hrm I don't think I have enough information. I'm not getting the same error, instead I'm getting some more general dask issues. Could you share your full conda environment?

Chris-hughes10 commented 3 years ago

Hi @jacobtomlinson , I continued to investigate this and I think I have found the issue. The problem appears to occur based on the version of toolz installed. Here are the envs that I used, for both successful and unsuccessful runs:

Not Working

client
name: dask-new
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - argon2-cffi=20.1.0=py38h27cfd23_1
  - async_generator=1.10=pyhd3eb1b0_0
  - attrs=20.3.0=pyhd3eb1b0_0
  - backcall=0.2.0=pyhd3eb1b0_0
  - bleach=3.3.0=pyhd3eb1b0_0
  - ca-certificates=2021.1.19=h06a4308_0
  - certifi=2020.12.5=py38h06a4308_0
  - cffi=1.14.5=py38h261ae71_0
  - dbus=1.13.18=hb2f20db_0
  - decorator=4.4.2=pyhd3eb1b0_0
  - defusedxml=0.6.0=pyhd3eb1b0_0
  - entrypoints=0.3=py38_0
  - expat=2.2.10=he6710b0_2
  - fontconfig=2.13.1=h6c09931_0
  - freetype=2.10.4=h5ab3b9f_0
  - glib=2.67.4=h36276a3_1
  - gst-plugins-base=1.14.0=h8213a91_2
  - gstreamer=1.14.0=h28cd5cc_2
  - icu=58.2=he6710b0_3
  - importlib-metadata=2.0.0=py_1
  - importlib_metadata=2.0.0=1
  - ipykernel=5.3.4=py38h5ca1d4c_0
  - ipython=7.21.0=py38hb070fc8_0
  - ipython_genutils=0.2.0=pyhd3eb1b0_1
  - ipywidgets=7.6.3=pyhd3eb1b0_1
  - jedi=0.17.0=py38_0
  - jinja2=2.11.3=pyhd3eb1b0_0
  - jpeg=9b=h024ee3a_2
  - jsonschema=3.2.0=py_2
  - jupyter=1.0.0=py38_7
  - jupyter_client=6.1.7=py_0
  - jupyter_console=6.2.0=py_0
  - jupyter_core=4.7.1=py38h06a4308_0
  - jupyterlab_pygments=0.1.2=py_0
  - jupyterlab_widgets=1.0.0=pyhd3eb1b0_1
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20191231=h14c3975_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=9.1.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libsodium=1.0.18=h7b6447c_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libuuid=1.0.3=h1bed415_2
  - libxcb=1.14=h7b6447c_0
  - libxml2=2.9.10=hb55368b_3
  - markupsafe=1.1.1=py38h7b6447c_0
  - mistune=0.8.4=py38h7b6447c_1000
  - nbclient=0.5.3=pyhd3eb1b0_0
  - nbconvert=6.0.7=py38_0
  - nbformat=5.1.2=pyhd3eb1b0_1
  - ncurses=6.2=he6710b0_1
  - nest-asyncio=1.5.1=pyhd3eb1b0_0
  - notebook=6.2.0=py38h06a4308_0
  - openssl=1.1.1j=h27cfd23_0
  - packaging=20.9=pyhd3eb1b0_0
  - pandoc=2.11=hb0f4dca_0
  - pandocfilters=1.4.3=py38h06a4308_1
  - parso=0.8.1=pyhd3eb1b0_0
  - pcre=8.44=he6710b0_0
  - pexpect=4.8.0=pyhd3eb1b0_3
  - pickleshare=0.7.5=pyhd3eb1b0_1003
  - pip=21.0.1=py38h06a4308_0
  - prometheus_client=0.9.0=pyhd3eb1b0_0
  - prompt-toolkit=3.0.8=py_0
  - prompt_toolkit=3.0.8=0
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - pycparser=2.20=py_2
  - pygments=2.8.0=pyhd3eb1b0_0
  - pyparsing=2.4.7=pyhd3eb1b0_0
  - pyqt=5.9.2=py38h05f1152_4
  - pyrsistent=0.17.3=py38h7b6447c_0
  - python=3.8.8=hdb3f193_4
  - python-dateutil=2.8.1=pyhd3eb1b0_0
  - pyzmq=20.0.0=py38h2531618_1
  - qt=5.9.7=h5867ecd_1
  - qtconsole=5.0.2=pyhd3eb1b0_0
  - qtpy=1.9.0=py_0
  - readline=8.1=h27cfd23_0
  - send2trash=1.5.0=pyhd3eb1b0_1
  - setuptools=52.0.0=py38h06a4308_0
  - sip=4.19.13=py38he6710b0_0
  - six=1.15.0=py38h06a4308_0
  - sqlite=3.33.0=h62c20be_0
  - terminado=0.9.2=py38h06a4308_0
  - testpath=0.4.4=pyhd3eb1b0_0
  - tk=8.6.10=hbc83047_0
  - tornado=6.1=py38h27cfd23_0
  - traitlets=5.0.5=pyhd3eb1b0_0
  - wcwidth=0.2.5=py_0
  - webencodings=0.5.1=py38_1
  - wheel=0.36.2=pyhd3eb1b0_0
  - widgetsnbextension=3.5.1=py38_0
  - xz=5.2.5=h7b6447c_0
  - zeromq=4.3.3=he6710b0_3
  - zipp=3.4.0=pyhd3eb1b0_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - aiobotocore==1.2.1
    - aiohttp==3.7.4
    - aioitertools==0.7.1
    - async-timeout==3.0.1
    - blosc==1.9.2
    - botocore==1.19.52
    - cachetools==4.2.1
    - chardet==3.0.4
    - click==7.1.2
    - cloudpickle==1.6.0
    - dask==2021.2.0
    - dask-glm==0.2.0
    - dask-kubernetes==0.11.0
    - dask-ml==1.8.0
    - distributed==2021.2.0
    - fsspec==0.8.7
    - google-auth==1.27.0
    - heapdict==1.0.1
    - idna==2.10
    - jmespath==0.10.0
    - joblib==1.0.1
    - kubernetes==12.0.1
    - kubernetes-asyncio==12.0.1
    - llvmlite==0.35.0
    - locket==0.2.1
    - lz4==3.1.1
    - msgpack==1.0.2
    - multidict==5.1.0
    - multipledispatch==0.6.0
    - numba==0.52.0
    - numpy==1.20.1
    - oauthlib==3.1.0
    - pandas==1.2.3
    - partd==1.1.0
    - psutil==5.8.0
    - pyarrow==3.0.0
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pytz==2021.1
    - pyyaml==5.4.1
    - requests==2.25.1
    - requests-oauthlib==1.3.0
    - rsa==4.7.2
    - s3fs==0.5.2
    - scikit-learn==0.24.1
    - scipy==1.6.1
    - sortedcontainers==2.3.0
    - tblib==1.7.0
    - threadpoolctl==2.1.0
    - toolz==0.11.1
    - typing-extensions==3.7.4.3
    - urllib3==1.26.3
    - websocket-client==0.58.0
    - wrapt==1.12.1
    - xgboost==1.3.3
    - yarl==1.6.3
    - zict==2.0.0
prefix: /anaconda/envs/dask-new
worker
name: base
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - blosc=1.20.1=he1b5a44_0
  - bokeh=2.1.1=py38h32f6830_0
  - brotlipy=0.7.0=py38h8df0ef7_1001
  - ca-certificates=2020.12.5=ha878542_0
  - certifi=2020.12.5=py38h578d9bd_1
  - cffi=1.14.4=py38ha312104_0
  - click=7.1.2=pyh9f0ad1d_0
  - cloudpickle=1.6.0=py_0
  - conda=4.9.2=py38h578d9bd_0
  - conda-package-handling=1.7.2=py38h8df0ef7_0
  - cryptography=3.2.1=py38h7699a38_0
  - cytoolz=0.11.0=py38h25fe258_1
  - freetype=2.10.4=h7ca028e_0
  - fsspec=0.8.5=pyhd8ed1ab_0
  - heapdict=1.0.1=py_0
  - idna=2.10=pyh9f0ad1d_0
  - jinja2=2.11.2=pyh9f0ad1d_0
  - jpeg=9d=h36c2ea0_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libblas=3.9.0=7_openblas
  - libcblas=3.9.0=7_openblas
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.5.0=h14aa051_18
  - libgfortran4=7.5.0=h14aa051_18
  - liblapack=3.9.0=7_openblas
  - libopenblas=0.3.12=pthreads_hb3c22a3_1
  - libpng=1.6.37=h21135ba_2
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.0.10=h9022e91_1002
  - locket=0.2.0=py_2
  - lz4=3.1.1=py38h87b837d_0
  - lz4-c=1.9.2=he1b5a44_3
  - markupsafe=1.1.1=py38h8df0ef7_2
  - msgpack-python=1.0.0=py38h82cb98a_2
  - ncurses=6.2=he6710b0_0
  - nomkl=1.0=h5ca1d4c_0
  - numpy=1.18.1=py38h8854b6b_1
  - olefile=0.46=pyh9f0ad1d_1
  - openssl=1.1.1h=h516909a_0
  - packaging=20.8=pyhd3deb0d_0
  - partd=1.1.0=py_0
  - pillow=6.2.1=py38h34e0f95_0
  - pip=20.3.3=pyhd8ed1ab_0
  - psutil=5.7.3=py38h8df0ef7_0
  - pycosat=0.6.3=py38h8df0ef7_1005
  - pycparser=2.20=pyh9f0ad1d_2
  - pyopenssl=20.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pysocks=1.7.1=py38h578d9bd_3
  - python=3.8.0=h0371630_2
  - python-blosc=1.9.2=py38h0ef3d22_3
  - python-dateutil=2.8.1=py_0
  - python_abi=3.8=1_cp38
  - pytz=2020.5=pyhd8ed1ab_0
  - pyyaml=5.1.2=py38h516909a_0
  - readline=7.0=h7b6447c_5
  - requests=2.25.1=pyhd3deb0d_0
  - ruamel_yaml=0.15.87=py38h7b6447c_0
  - setuptools=49.6.0=py38h578d9bd_3
  - six=1.15.0=pyh9f0ad1d_0
  - sortedcontainers=2.3.0=pyhd8ed1ab_0
  - sqlite=3.31.1=h7b6447c_0
  - tblib=1.6.0=py_0
  - tini=0.18.0=h14c3975_1001
  - tk=8.6.8=hbc83047_0
  - toolz=0.11.1=py_0
  - tornado=6.1=py38h25fe258_0
  - tqdm=4.42.1=py_0
  - typing_extensions=3.7.4.3=py_0
  - urllib3=1.26.2=pyhd8ed1ab_0
  - wheel=0.36.2=pyhd3deb0d_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zict=2.0.0=py_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.3=1
  - pip:
    - aiobotocore==1.2.1
    - aiohttp==3.7.4
    - aioitertools==0.7.1
    - async-timeout==3.0.1
    - attrs==20.3.0
    - botocore==1.19.52
    - chardet==3.0.4
    - dask==2021.2.0
    - dask-glm==0.2.0
    - dask-ml==1.8.0
    - distributed==2021.2.0
    - fastparquet==0.5.0
    - jmespath==0.10.0
    - joblib==1.0.1
    - llvmlite==0.35.0
    - multidict==5.1.0
    - multipledispatch==0.6.0
    - numba==0.52.0
    - pandas==1.2.3
    - pyarrow==3.0.0
    - s3fs==0.5.2
    - scikit-learn==0.24.1
    - scipy==1.6.1
    - threadpoolctl==2.1.0
    - thrift==0.13.0
    - wrapt==1.12.1
    - xgboost==1.3.3
    - yarl==1.6.3
prefix: /opt/conda
scheduler
name: base
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - blosc=1.20.1=he1b5a44_0
  - bokeh=2.1.1=py38h32f6830_0
  - brotlipy=0.7.0=py38h8df0ef7_1001
  - ca-certificates=2020.12.5=ha878542_0
  - certifi=2020.12.5=py38h578d9bd_1
  - cffi=1.14.4=py38ha312104_0
  - chardet=4.0.0=py38h578d9bd_1
  - click=7.1.2=pyh9f0ad1d_0
  - cloudpickle=1.6.0=py_0
  - conda=4.9.2=py38h578d9bd_0
  - conda-package-handling=1.7.2=py38h8df0ef7_0
  - cryptography=3.2.1=py38h7699a38_0
  - cytoolz=0.11.0=py38h25fe258_1
  - freetype=2.10.4=h7ca028e_0
  - fsspec=0.8.5=pyhd8ed1ab_0
  - heapdict=1.0.1=py_0
  - idna=2.10=pyh9f0ad1d_0
  - jinja2=2.11.2=pyh9f0ad1d_0
  - jpeg=9d=h36c2ea0_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libblas=3.9.0=7_openblas
  - libcblas=3.9.0=7_openblas
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.5.0=h14aa051_18
  - libgfortran4=7.5.0=h14aa051_18
  - liblapack=3.9.0=7_openblas
  - libopenblas=0.3.12=pthreads_hb3c22a3_1
  - libpng=1.6.37=h21135ba_2
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.0.10=h9022e91_1002
  - locket=0.2.0=py_2
  - lz4=3.1.1=py38h87b837d_0
  - lz4-c=1.9.2=he1b5a44_3
  - markupsafe=1.1.1=py38h8df0ef7_2
  - msgpack-python=1.0.0=py38h82cb98a_2
  - ncurses=6.2=he6710b0_0
  - nomkl=1.0=h5ca1d4c_0
  - numpy=1.18.1=py38h8854b6b_1
  - olefile=0.46=pyh9f0ad1d_1
  - openssl=1.1.1h=h516909a_0
  - packaging=20.8=pyhd3deb0d_0
  - pandas=1.0.1=py38hb3f55d8_0
  - partd=1.1.0=py_0
  - pillow=6.2.1=py38h34e0f95_0
  - pip=20.3.3=pyhd8ed1ab_0
  - psutil=5.7.3=py38h8df0ef7_0
  - pycosat=0.6.3=py38h8df0ef7_1005
  - pycparser=2.20=pyh9f0ad1d_2
  - pyopenssl=20.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pysocks=1.7.1=py38h578d9bd_3
  - python=3.8.0=h0371630_2
  - python-blosc=1.9.2=py38h0ef3d22_3
  - python-dateutil=2.8.1=py_0
  - python_abi=3.8=1_cp38
  - pytz=2020.5=pyhd8ed1ab_0
  - pyyaml=5.1.2=py38h516909a_0
  - readline=7.0=h7b6447c_5
  - requests=2.25.1=pyhd3deb0d_0
  - ruamel_yaml=0.15.87=py38h7b6447c_0
  - setuptools=49.6.0=py38h578d9bd_3
  - six=1.15.0=pyh9f0ad1d_0
  - sortedcontainers=2.3.0=pyhd8ed1ab_0
  - sqlite=3.31.1=h7b6447c_0
  - tblib=1.6.0=py_0
  - tini=0.18.0=h14c3975_1001
  - tk=8.6.8=hbc83047_0
  - toolz=0.11.1=py_0
  - tornado=6.1=py38h25fe258_0
  - tqdm=4.42.1=py_0
  - typing_extensions=3.7.4.3=py_0
  - urllib3=1.26.2=pyhd8ed1ab_0
  - wheel=0.36.2=pyhd3deb0d_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zict=2.0.0=py_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.3=1
  - pip:
    - dask==2021.2.0
    - dask-glm==0.2.0
    - dask-ml==1.8.0
    - distributed==2021.2.0
    - joblib==1.0.1
    - llvmlite==0.35.0
    - multipledispatch==0.6.0
    - numba==0.52.0
    - scikit-learn==0.24.1
    - scipy==1.6.1
    - threadpoolctl==2.1.0
    - xgboost==1.3.3

Working

client
name: dask-new
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - argon2-cffi=20.1.0=py38h27cfd23_1
  - async_generator=1.10=pyhd3eb1b0_0
  - attrs=20.3.0=pyhd3eb1b0_0
  - backcall=0.2.0=pyhd3eb1b0_0
  - bleach=3.3.0=pyhd3eb1b0_0
  - ca-certificates=2021.1.19=h06a4308_0
  - certifi=2020.12.5=py38h06a4308_0
  - cffi=1.14.5=py38h261ae71_0
  - dbus=1.13.18=hb2f20db_0
  - decorator=4.4.2=pyhd3eb1b0_0
  - defusedxml=0.6.0=pyhd3eb1b0_0
  - entrypoints=0.3=py38_0
  - expat=2.2.10=he6710b0_2
  - fontconfig=2.13.1=h6c09931_0
  - freetype=2.10.4=h5ab3b9f_0
  - glib=2.67.4=h36276a3_1
  - gst-plugins-base=1.14.0=h8213a91_2
  - gstreamer=1.14.0=h28cd5cc_2
  - icu=58.2=he6710b0_3
  - importlib-metadata=2.0.0=py_1
  - importlib_metadata=2.0.0=1
  - ipykernel=5.3.4=py38h5ca1d4c_0
  - ipython=7.21.0=py38hb070fc8_0
  - ipython_genutils=0.2.0=pyhd3eb1b0_1
  - ipywidgets=7.6.3=pyhd3eb1b0_1
  - jedi=0.17.0=py38_0
  - jinja2=2.11.3=pyhd3eb1b0_0
  - jpeg=9b=h024ee3a_2
  - jsonschema=3.2.0=py_2
  - jupyter=1.0.0=py38_7
  - jupyter_client=6.1.7=py_0
  - jupyter_console=6.2.0=py_0
  - jupyter_core=4.7.1=py38h06a4308_0
  - jupyterlab_pygments=0.1.2=py_0
  - jupyterlab_widgets=1.0.0=pyhd3eb1b0_1
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20191231=h14c3975_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=9.1.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libsodium=1.0.18=h7b6447c_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libuuid=1.0.3=h1bed415_2
  - libxcb=1.14=h7b6447c_0
  - libxml2=2.9.10=hb55368b_3
  - markupsafe=1.1.1=py38h7b6447c_0
  - mistune=0.8.4=py38h7b6447c_1000
  - nbclient=0.5.3=pyhd3eb1b0_0
  - nbconvert=6.0.7=py38_0
  - nbformat=5.1.2=pyhd3eb1b0_1
  - ncurses=6.2=he6710b0_1
  - nest-asyncio=1.5.1=pyhd3eb1b0_0
  - notebook=6.2.0=py38h06a4308_0
  - openssl=1.1.1j=h27cfd23_0
  - packaging=20.9=pyhd3eb1b0_0
  - pandoc=2.11=hb0f4dca_0
  - pandocfilters=1.4.3=py38h06a4308_1
  - parso=0.8.1=pyhd3eb1b0_0
  - pcre=8.44=he6710b0_0
  - pexpect=4.8.0=pyhd3eb1b0_3
  - pickleshare=0.7.5=pyhd3eb1b0_1003
  - pip=21.0.1=py38h06a4308_0
  - prometheus_client=0.9.0=pyhd3eb1b0_0
  - prompt-toolkit=3.0.8=py_0
  - prompt_toolkit=3.0.8=0
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - pycparser=2.20=py_2
  - pygments=2.8.0=pyhd3eb1b0_0
  - pyparsing=2.4.7=pyhd3eb1b0_0
  - pyqt=5.9.2=py38h05f1152_4
  - pyrsistent=0.17.3=py38h7b6447c_0
  - python=3.8.8=hdb3f193_4
  - python-dateutil=2.8.1=pyhd3eb1b0_0
  - pyzmq=20.0.0=py38h2531618_1
  - qt=5.9.7=h5867ecd_1
  - qtconsole=5.0.2=pyhd3eb1b0_0
  - qtpy=1.9.0=py_0
  - readline=8.1=h27cfd23_0
  - send2trash=1.5.0=pyhd3eb1b0_1
  - setuptools=52.0.0=py38h06a4308_0
  - sip=4.19.13=py38he6710b0_0
  - six=1.15.0=py38h06a4308_0
  - sqlite=3.33.0=h62c20be_0
  - terminado=0.9.2=py38h06a4308_0
  - testpath=0.4.4=pyhd3eb1b0_0
  - tk=8.6.10=hbc83047_0
  - tornado=6.1=py38h27cfd23_0
  - traitlets=5.0.5=pyhd3eb1b0_0
  - wcwidth=0.2.5=py_0
  - webencodings=0.5.1=py38_1
  - wheel=0.36.2=pyhd3eb1b0_0
  - widgetsnbextension=3.5.1=py38_0
  - xz=5.2.5=h7b6447c_0
  - zeromq=4.3.3=he6710b0_3
  - zipp=3.4.0=pyhd3eb1b0_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - aiobotocore==1.2.1
    - aiohttp==3.7.4
    - aioitertools==0.7.1
    - async-timeout==3.0.1
    - blosc==1.9.2
    - botocore==1.19.52
    - cachetools==4.2.1
    - chardet==3.0.4
    - click==7.1.2
    - cloudpickle==1.6.0
    - dask==2021.2.0
    - dask-glm==0.2.0
    - dask-kubernetes==0.11.0
    - dask-ml==1.8.0
    - distributed==2021.2.0
    - fsspec==0.8.7
    - google-auth==1.27.0
    - heapdict==1.0.1
    - idna==2.10
    - jmespath==0.10.0
    - joblib==1.0.1
    - kubernetes==12.0.1
    - kubernetes-asyncio==12.0.1
    - llvmlite==0.35.0
    - locket==0.2.1
    - lz4==3.1.1
    - msgpack==1.0.2
    - multidict==5.1.0
    - multipledispatch==0.6.0
    - numba==0.52.0
    - numpy==1.20.1
    - oauthlib==3.1.0
    - pandas==1.2.3
    - partd==1.1.0
    - psutil==5.8.0
    - pyarrow==3.0.0
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pytz==2021.1
    - pyyaml==5.4.1
    - requests==2.25.1
    - requests-oauthlib==1.3.0
    - rsa==4.7.2
    - s3fs==0.5.2
    - scikit-learn==0.24.1
    - scipy==1.6.1
    - sortedcontainers==2.3.0
    - tblib==1.7.0
    - threadpoolctl==2.1.0
    - toolz==0.10.0
    - typing-extensions==3.7.4.3
    - urllib3==1.26.3
    - websocket-client==0.58.0
    - wrapt==1.12.1
    - xgboost==1.3.3
    - yarl==1.6.3
    - zict==2.0.0
prefix: /anaconda/envs/dask-new
worker
name: base
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - blosc=1.20.1=he1b5a44_0
  - bokeh=2.1.1=py38h32f6830_0
  - brotlipy=0.7.0=py38h8df0ef7_1001
  - ca-certificates=2020.12.5=ha878542_0
  - certifi=2020.12.5=py38h578d9bd_1
  - cffi=1.14.4=py38ha312104_0
  - click=7.1.2=pyh9f0ad1d_0
  - cloudpickle=1.6.0=py_0
  - conda=4.9.2=py38h578d9bd_0
  - conda-package-handling=1.7.2=py38h8df0ef7_0
  - cryptography=3.2.1=py38h7699a38_0
  - cytoolz=0.11.0=py38h25fe258_1
  - freetype=2.10.4=h7ca028e_0
  - fsspec=0.8.5=pyhd8ed1ab_0
  - heapdict=1.0.1=py_0
  - idna=2.10=pyh9f0ad1d_0
  - jinja2=2.11.2=pyh9f0ad1d_0
  - jpeg=9d=h36c2ea0_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libblas=3.9.0=7_openblas
  - libcblas=3.9.0=7_openblas
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.5.0=h14aa051_18
  - libgfortran4=7.5.0=h14aa051_18
  - liblapack=3.9.0=7_openblas
  - libopenblas=0.3.12=pthreads_hb3c22a3_1
  - libpng=1.6.37=h21135ba_2
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.0.10=h9022e91_1002
  - locket=0.2.0=py_2
  - lz4=3.1.1=py38h87b837d_0
  - lz4-c=1.9.2=he1b5a44_3
  - markupsafe=1.1.1=py38h8df0ef7_2
  - msgpack-python=1.0.0=py38h82cb98a_2
  - ncurses=6.2=he6710b0_0
  - nomkl=1.0=h5ca1d4c_0
  - numpy=1.18.1=py38h8854b6b_1
  - olefile=0.46=pyh9f0ad1d_1
  - openssl=1.1.1h=h516909a_0
  - packaging=20.8=pyhd3deb0d_0
  - partd=1.1.0=py_0
  - pillow=6.2.1=py38h34e0f95_0
  - pip=20.3.3=pyhd8ed1ab_0
  - psutil=5.7.3=py38h8df0ef7_0
  - pycosat=0.6.3=py38h8df0ef7_1005
  - pycparser=2.20=pyh9f0ad1d_2
  - pyopenssl=20.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pysocks=1.7.1=py38h578d9bd_3
  - python=3.8.0=h0371630_2
  - python-blosc=1.9.2=py38h0ef3d22_3
  - python-dateutil=2.8.1=py_0
  - python_abi=3.8=1_cp38
  - pytz=2020.5=pyhd8ed1ab_0
  - pyyaml=5.1.2=py38h516909a_0
  - readline=7.0=h7b6447c_5
  - requests=2.25.1=pyhd3deb0d_0
  - ruamel_yaml=0.15.87=py38h7b6447c_0
  - setuptools=49.6.0=py38h578d9bd_3
  - six=1.15.0=pyh9f0ad1d_0
  - sortedcontainers=2.3.0=pyhd8ed1ab_0
  - sqlite=3.31.1=h7b6447c_0
  - tblib=1.6.0=py_0
  - tini=0.18.0=h14c3975_1001
  - tk=8.6.8=hbc83047_0
  - tornado=6.1=py38h25fe258_0
  - tqdm=4.42.1=py_0
  - typing_extensions=3.7.4.3=py_0
  - urllib3=1.26.2=pyhd8ed1ab_0
  - wheel=0.36.2=pyhd3deb0d_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zict=2.0.0=py_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.3=1
  - pip:
    - aiobotocore==1.2.1
    - aiohttp==3.7.4
    - aioitertools==0.7.1
    - async-timeout==3.0.1
    - attrs==20.3.0
    - botocore==1.19.52
    - chardet==3.0.4
    - dask==2021.2.0
    - dask-glm==0.2.0
    - dask-ml==1.8.0
    - distributed==2021.2.0
    - fastparquet==0.5.0
    - jmespath==0.10.0
    - joblib==1.0.1
    - llvmlite==0.35.0
    - multidict==5.1.0
    - multipledispatch==0.6.0
    - numba==0.52.0
    - pandas==1.2.3
    - pyarrow==3.0.0
    - s3fs==0.5.2
    - scikit-learn==0.24.1
    - scipy==1.6.1
    - threadpoolctl==2.1.0
    - thrift==0.13.0
    - toolz==0.10.0
    - wrapt==1.12.1
    - xgboost==1.3.3
    - yarl==1.6.3
scheduler
name: base
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - blosc=1.20.1=he1b5a44_0
  - bokeh=2.1.1=py38h32f6830_0
  - brotlipy=0.7.0=py38h8df0ef7_1001
  - ca-certificates=2020.12.5=ha878542_0
  - certifi=2020.12.5=py38h578d9bd_1
  - cffi=1.14.4=py38ha312104_0
  - chardet=4.0.0=py38h578d9bd_1
  - click=7.1.2=pyh9f0ad1d_0
  - cloudpickle=1.6.0=py_0
  - conda=4.9.2=py38h578d9bd_0
  - conda-package-handling=1.7.2=py38h8df0ef7_0
  - cryptography=3.2.1=py38h7699a38_0
  - cytoolz=0.11.0=py38h25fe258_1
  - freetype=2.10.4=h7ca028e_0
  - fsspec=0.8.5=pyhd8ed1ab_0
  - heapdict=1.0.1=py_0
  - idna=2.10=pyh9f0ad1d_0
  - jinja2=2.11.2=pyh9f0ad1d_0
  - jpeg=9d=h36c2ea0_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libblas=3.9.0=7_openblas
  - libcblas=3.9.0=7_openblas
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.5.0=h14aa051_18
  - libgfortran4=7.5.0=h14aa051_18
  - liblapack=3.9.0=7_openblas
  - libopenblas=0.3.12=pthreads_hb3c22a3_1
  - libpng=1.6.37=h21135ba_2
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.0.10=h9022e91_1002
  - locket=0.2.0=py_2
  - lz4=3.1.1=py38h87b837d_0
  - lz4-c=1.9.2=he1b5a44_3
  - markupsafe=1.1.1=py38h8df0ef7_2
  - msgpack-python=1.0.0=py38h82cb98a_2
  - ncurses=6.2=he6710b0_0
  - nomkl=1.0=h5ca1d4c_0
  - numpy=1.18.1=py38h8854b6b_1
  - olefile=0.46=pyh9f0ad1d_1
  - openssl=1.1.1h=h516909a_0
  - packaging=20.8=pyhd3deb0d_0
  - pandas=1.0.1=py38hb3f55d8_0
  - partd=1.1.0=py_0
  - pillow=6.2.1=py38h34e0f95_0
  - pip=20.3.3=pyhd8ed1ab_0
  - psutil=5.7.3=py38h8df0ef7_0
  - pycosat=0.6.3=py38h8df0ef7_1005
  - pycparser=2.20=pyh9f0ad1d_2
  - pyopenssl=20.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pysocks=1.7.1=py38h578d9bd_3
  - python=3.8.0=h0371630_2
  - python-blosc=1.9.2=py38h0ef3d22_3
  - python-dateutil=2.8.1=py_0
  - python_abi=3.8=1_cp38
  - pytz=2020.5=pyhd8ed1ab_0
  - pyyaml=5.1.2=py38h516909a_0
  - readline=7.0=h7b6447c_5
  - requests=2.25.1=pyhd3deb0d_0
  - ruamel_yaml=0.15.87=py38h7b6447c_0
  - setuptools=49.6.0=py38h578d9bd_3
  - six=1.15.0=pyh9f0ad1d_0
  - sortedcontainers=2.3.0=pyhd8ed1ab_0
  - sqlite=3.31.1=h7b6447c_0
  - tblib=1.6.0=py_0
  - tini=0.18.0=h14c3975_1001
  - tk=8.6.8=hbc83047_0
  - tornado=6.1=py38h25fe258_0
  - tqdm=4.42.1=py_0
  - typing_extensions=3.7.4.3=py_0
  - urllib3=1.26.2=pyhd8ed1ab_0
  - wheel=0.36.2=pyhd3deb0d_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zict=2.0.0=py_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.3=1
  - pip:
    - dask==2021.2.0
    - dask-glm==0.2.0
    - dask-ml==1.8.0
    - distributed==2021.2.0
    - joblib==1.0.1
    - llvmlite==0.35.0
    - multipledispatch==0.6.0
    - numba==0.52.0
    - scikit-learn==0.24.1
    - scipy==1.6.1
    - threadpoolctl==2.1.0
    - toolz==0.10.0
    - xgboost==1.3.3
prefix: /opt/conda

Additionally, it is only successful when these lines are called:

X_train = X_train.persist()
y_train = y_train.persist()

otherwise the error still occurs.

jacobtomlinson commented 3 years ago

Thanks for the info.

It definitely sounds like this is an environment issue and some versions are not playing nicely together. Additionally I think the fact you have to persist the data could be an issue in XGBoost.

I'm not sure there is anything we can change in this project (dask-kubernetes) to resolve this for you.