Open blis-teng opened 1 year ago
Thank you for this report!
@nvliyuan do you want to take a look at this?
Actually, nvliyuan has been contributing to the spark runtime. Not certain who to tap about the dask runtime. I'll check the commit history shortly and get back to you.
Hi @cjac, since dask script failed since 22.06 version(2022.06), see this comments, so I believe this issue exists for a long time, maybe @mengdong @sameerz could involve some dask-rapids guys?
Hey folks! I work on RAPIDS and Dask, happy to help. We are currently in the process of documenting and testing deploying RAPIDS on cloud platforms but I expect we will not get to Dataproc until after the holidays. But we will definitely dig into this as part of that work.
Pinging @mroeschke who may have some quick thoughts about the Pandas error. I expect pandas needs upgrading/downgrading.
I suspect your environment has pandas>=1.5 installed, and cudf was not compatible with that version of pandas until 22.10.
Therefore if you downgrade pandas<1.5 or upgrade cudf>22.10 the error No module named 'pandas.core.arrays._arrow_utils'
should go away
Thank you Jacob and Matt!
@blis-teng - please let us know if this solves this issue for you so we can mark the issue resolved or otherwise offer an appropriate solution.
@blis-teng - are you able to share the gcloud dataproc clusters create
command you're using to spin up your cluster? I can try to give it a repro and see if I run into the same problems.
If you've got a support contract with GCP, I'd appreciate if you could open a support case and provide me the case #. By doing this, we can track our work and share case details privately rather than on the permanent record for the initialization-actions repository. Please do not open development cases as P2 or P1, as those are reserved for production outage situations, and development is by definition not a production environment.
C.J. in Cloud Support, Seattle
I suspect your environment has pandas>=1.5 installed, and cudf was not compatible with that version of pandas until 22.10.
Therefore if you downgrade pandas<1.5 or upgrade cudf>22.10 the error
No module named 'pandas.core.arrays._arrow_utils'
should go away
I have tried, but it will not work.
I used the cmd line from the given documentation in https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/rapids/README.md Minor details may be different, but the key parameters (gpu driver, rapids-runtime) are the same.
export CLUSTER_NAME=<cluster_name>
export GCS_BUCKET=<your bucket for the logs and notebooks>
export REGION=<region>
export NUM_GPUS=1
export NUM_WORKERS=2
gcloud dataproc clusters create $CLUSTER_NAME \
--region $REGION \
--image-version=dp20 \
--master-machine-type n1-custom-63500 \
--num-workers $NUM_WORKERS \
--worker-accelerator type=nvidia-tesla-t4,count=$NUM_GPUS \
--worker-machine-type n1-standard-8 \
--num-worker-local-ssds 1 \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/gpu/install_gpu_driver.sh,gs://goog-dataproc-initialization-actions-${REGION}/rapids/rapids.sh \
--optional-components=JUPYTER,ZEPPELIN \
--metadata gpu-driver-provider="NVIDIA",rapids-runtime="DASK" \
--bucket $GCS_BUCKET \
okay, I'll try to reproduce it now.
With these arguments, it is installing pandas-1.2.5 and libcudf-22.04.00-cuda11. I think I found a bug in the rapids.sh script. I'll see if patching it improves the situation.
In order to use 22.10 with pandas>=1.5, I need to upgrade these python packages:
"cuspatial=${CUSPATIAL_VERSION}" "rope>=0.9.4" "gdal>3.5.0"
And gdal>3.5.0 is not available in bullseye. Backports only go up to 3.2, so I'm going to try ubuntu20.
cjac@cluster-1668020639-w-0:~$ apt-cache show libgdal-dev | grep ^Version
Version: 3.0.4+dfsg-1build3
cjac@cluster-1668020639-w-0:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
so no, it looks like pandas >= 1.5 is not stable. I'll try doing the lower numbers.
+ mamba install -y --no-channel-priority -c conda-forge -c nvidia -c rapidsai cudatoolkit=11.5 'pandas<1.5' rapids=22.04
Looking for: ['cudatoolkit=11.5', "pandas[version='<1.5']", 'rapids=22.04']
Pinned packages:
- python 3.10.*
- conda 22.9.*
- python 3.10.*
- r-base 4.1.*
- r-recommended 4.1.*
Encountered problems while solving:
- package rapids-22.04.00-cuda11_py39_ge08d166_149 requires python >=3.9,<3.10.0a0, but none of the providers can be installed
cjac@cluster-1668020639-w-0:~$ which conda
/opt/conda/default/bin/conda
cjac@cluster-1668020639-w-0:~$ /opt/conda/default/bin/python --version
Python 3.10.8
Now it looks like the python interpreter we install with dataproc is too new for the rapids release. I'll try 22.06 and 22.08 to see if either of those versions work.
Okay, I was able to get this working on 2.0-debian10 with dask-rapids 22.06
I had to specify this mamba command:
mamba install -n 'dask-rapids' -y --no-channel-priority -c 'conda-forge' -c 'nvidia' -c 'rapidsai' \ "cudatoolkit=${CUDA_VERSION}" "pandas<1.5" "rapids=${RAPIDS_VERSION}" "python=3.9"
I'm testing the change with dask-rapids 22.08 ; if that works as well, I will submit a PR.
@blis-teng - please try replacing the rapids.sh you link to from your project's initialization-actions checkout with this one.
https://github.com/cjac/initialization-actions/raw/dask-rapids-202212/rapids/rapids.sh
I am working with the product team to review this change. I should be able to close up PR #1041 pretty quick here.
You may have mentioned that you have not yet read the README.md[1] from the initialization-actions repository. Can you please review and confirm for me that you understand where you would like to copy rapids.sh[2] from my pre-release branch for testing?
[1] https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/README.md#how-initialization-actions-are-used [2] https://github.com/cjac/initialization-actions/raw/dask-rapids-202212/rapids/rapids.sh
@blis-teng can you re-try using the latest rapids/rapids.sh from github?
hi, @cjac sorry for the late reply, I will re-try the new rapids.sh and get back to you next week, thanks!
Thank you. Standing by for confirmation! 20230106T084758 + 7d will be 20230113T084757.
I am presently not able to reproduce your problem. If there is still a change to be made, I'd like to know that information early in the week, please.
Please remember to read the README I referenced. You are violating the guidance by using
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/gpu/install_gpu_driver.sh,gs://goog-dataproc-initialization-actions-${REGION}/rapids/rapids.sh \
Hi @cjac , could you please update to work with latest dask-rapids v22.12
?
Not last I checked. What versions are you pinning to?
@cjac
root@test-dataproc-rapids-dask-m:/# conda list ^cu
# packages in environment at /opt/conda/miniconda3:
#
# Name Version Build Channel
cucim 22.04.00 cuda_11_py38_g8dfed80_0 rapidsai
cuda-python 11.8.1 py38h241159d_2 conda-forge
cudatoolkit 11.2.72 h2bc3f7f_0 nvidia
cudf 22.04.00 cuda_11_py38_g8bf0520170_0 rapidsai
cudf_kafka 22.04.00 py38_g8bf0520170_0 rapidsai
cugraph 22.04.00 cuda11_py38_g58be5b53_0 rapidsai
cuml 22.04.00 cuda11_py38_g95abbc746_0 rapidsai
cupy 9.6.0 py38h177b0fd_0 conda-forge
cupy-cuda115 10.6.0 pypi_0 pypi
curl 7.86.0 h7bff187_1 conda-forge
cusignal 22.04.00 py39_g06f58b4_0 rapidsai
cuspatial 22.04.00 py38_ge8f9f84_0 rapidsai
custreamz 22.04.00 py38_g8bf0520170_0 rapidsai
cuxfilter 22.04.00 py38_gf251a67_0 rapidsai
root@test-dataproc-rapids-dask-m:/# conda list ^das
# packages in environment at /opt/conda/miniconda3:
#
# Name Version Build Channel
dask 2022.3.0 pyhd8ed1ab_1 conda-forge
dask-bigquery 2022.5.0 pyhd8ed1ab_0 conda-forge
dask-core 2022.3.0 pyhd8ed1ab_0 conda-forge
dask-cuda 22.04.00 py38_0 rapidsai
dask-cudf 22.04.00 cuda_11_py38_g8bf0520170_0 rapidsai
dask-glm 0.2.0 py_1 conda-forge
dask-ml 2022.5.27 pyhd8ed1ab_0 conda-forge
dask-sql 2022.8.0 pyhd8ed1ab_0 conda-forge
dask-yarn 0.9 py38h578d9bd_2 conda-forge
I was wondering if you can upgrade rapids.sh
to install latest rapids v22.12
? or is there a reason not to? (**Ps I am aware you recently upgraded to 22.10. which I am yet to test)
Hi @cjac , could you please update to work with latest
dask-rapids v22.12
?
I'm about to go on vacation, and I'm trying to put projects down. Can you open a new issue or better yet a GCP support case so I don't lose track of the work item, please?
This issue is about the action not working. I think it's working now, but not patched up to latest release. A separate issue would be appropriate.
I am trying to setup a dataproc cluster with GPU attached, to use cuml and cudf, I followed the instruction https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/rapids/README.md And able to setup the cluster, with nvidia driver successfully installed. But when I try
It throws out the error
I follow the instruction here: https://docs.rapids.ai/notices/rsn0020/ But after the downgraded version, another error show up when import cudf which is
The dask rapids installation version in rapids.sh is 22.04