Kaggle / docker-python

Kaggle Python docker image
Apache License 2.0
2.41k stars 941 forks source link

CHAOS AT CURRENT CUDF WITH RAPIDS DRIVERS #1361

Open Hvnt3rK3ys opened 7 months ago

Hvnt3rK3ys commented 7 months ago

🐛 Bug

To Reproduce

import cudf as cf #Use Rapids framework dataframe for GPU (PANDAS)
import cupy as cp #Use Rapids framework arrays for GPU (NUMPY)

Get the log:

/opt/conda/lib/python3.10/site-packages/cudf/utils/_numba.py:110: UserWarning: Using CUDA toolkit version (12, 3) with CUDA driver version (12, 2) requires minor version compatibility, which is not yet supported for CUDA driver versions 12.0 and above. It is likely that many cuDF operations will not work in this state. Please install CUDA toolkit version (12, 2) to continue using cuDF.
  warnings.warn(

Expected behavior

🧐 NONE LOG 🧐

Additional context

The current CUDA drivers in the Docker ENV are:

| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2    

Possible solutions:

  1. Go to RAPIDS to update de Toolkit with the drivers (Doesnt work, at the end )

    
    #GET CUDF INNER CONFLICT 
    /opt/conda/lib/python3.10/site-packages/cudf/utils/_numba.py:17: UserWarning: CUDA Toolkit is newer than CUDA driver. Numba features will not work in this configuration. 
    warnings.warn(
    /opt/conda/lib/python3.10/site-packages/cupy/_environment.py:487: UserWarning: 
    --------------------------------------------------------------------------------
    
    CuPy may not function correctly because multiple CuPy packages are installed
    in your environment:
    
    cupy, cupy-cuda12x
    
    Follow these steps to resolve this issue:
    
    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:
    
         $ pip uninstall <package_name>
    
      If you previously installed CuPy via conda, also run the following:
    
         $ conda uninstall cupy
    
    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.
    
         https://docs.cupy.dev/en/stable/install.html

I KNOW , USING [pip uninstall cupy -y] IT will broke then as:


ImportError Traceback (most recent call last) Cell In[4], line 1 ----> 1 import cudf as cf #Use Rapids framework dataframe for GPU (PANDAS) 2 import cupy as cp #Use Rapids framework arrays for GPU (NUMPY)

File /opt/conda/lib/python3.10/site-packages/cudf/init.py:19 16 from rmm.allocators.cupy import rmm_cupy_allocator 17 from rmm.allocators.numba import RMMNumbaManager ---> 19 from cudf import api, core, datasets, testing 20 from cudf._version import __git_commit, version__ 21 from cudf.api.extensions import ( 22 register_dataframe_accessor, 23 register_index_accessor, 24 register_series_accessor, 25 )

File /opt/conda/lib/python3.10/site-packages/cudf/datasets.py:7 4 import pandas as pd 6 import cudf ----> 7 from cudf._lib.transform import bools_to_mask 8 from cudf.core.column_accessor import ColumnAccessor 10 all = ["timeseries", "randomdata"]

File /opt/conda/lib/python3.10/site-packages/cudf/_lib/init.py:4 1 # Copyright (c) 2020-2023, NVIDIA CORPORATION. 2 import numpy as np ----> 4 from . import ( 5 avro, 6 binaryop, 7 concat, 8 copying, 9 csv, 10 datetime, 11 expressions, 12 filling, 13 groupby, 14 hash, 15 interop, 16 join, 17 json, 18 labeling, 19 merge, 20 null_mask, 21 nvtext, 22 orc, 23 parquet, 24 partitioning, 25 pylibcudf, 26 quantiles, 27 reduce, 28 replace, 29 reshape, 30 rolling, 31 round, 32 search, 33 sort, 34 stream_compaction, 35 string_casting, 36 strings, 37 strings_udf, 38 text, 39 timezone, 40 transpose, 41 unary, 42 ) 44 MAX_COLUMN_SIZE = np.iinfo(np.int32).max 45 MAX_COLUMN_SIZE_STR = "INT32_MAX"

ImportError: /opt/conda/lib/python3.10/site-packages/cudf/_lib/avro.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN4cudf2io19avro_reader_options7builderENS0_11source_infoE


### Updating RAPIDS toolkit:
As the docs says:

[https://docs.rapids.ai/install#system-req](url)

pip install \ --extra-index-url=https://pypi.nvidia.com \ cudf-cu12==23.12. dask-cudf-cu12==23.12. cuml-cu12==23.12. \ cugraph-cu12==23.12. cuspatial-cu12==23.12. cuproj-cu12==23.12. \ cuxfilter-cu12==23.12. cucim-cu12==23.12. pylibraft-cu12==23.12. \ raft-dask-cu12==23.12.

### Will happen this thing:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. cudf 23.8.0 requires cubinlinker, which is not installed. cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed. cudf 23.8.0 requires ptxcompiler, which is not installed. cuml 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed. dask-cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed. apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.7 which is incompatible. apache-beam 2.46.0 requires protobuf<4,>3.12.2, but you have protobuf 4.25.2 which is incompatible. apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 14.0.2 which is incompatible. beatrix-jupyterlab 2023.128.151533 requires jupyterlab~=3.6.0, but you have jupyterlab 4.0.11 which is incompatible. cudf 23.8.0 requires cuda-python<12.0a0,>=11.7.1, but you have cuda-python 12.3.0 which is incompatible. cudf 23.8.0 requires pyarrow==11., but you have pyarrow 14.0.2 which is incompatible. cuml 23.8.0 requires dask==2023.7.1, but you have dask 2023.11.0 which is incompatible. cuml 23.8.0 requires dask-cuda==23.8., but you have dask-cuda 23.12.0 which is incompatible. cuml 23.8.0 requires distributed==2023.7.1, but you have distributed 2023.11.0 which is incompatible. cuml 23.8.0 requires treelite==3.2.0, but you have treelite 3.9.1 which is incompatible. cuml 23.8.0 requires treelite-runtime==3.2.0, but you have treelite-runtime 3.9.1 which is incompatible. dask-cudf 23.8.0 requires dask==2023.7.1, but you have dask 2023.11.0 which is incompatible. dask-cudf 23.8.0 requires distributed==2023.7.1, but you have distributed 2023.11.0 which is incompatible. google-cloud-aiplatform 0.6.0a1 requires google-api-core[grpc]<2.0.0dev,>=1.22.2, but you have google-api-core 2.11.1 which is incompatible. google-cloud-automl 1.0.1 requires google-api-core[grpc]<2.0.0dev,>=1.14.0, but you have google-api-core 2.11.1 which is incompatible. google-cloud-bigquery 2.34.4 requires packaging<22.0dev,>=14.3, but you have packaging 23.2 which is incompatible. google-cloud-bigquery 2.34.4 requires protobuf<4.0.0dev,>=3.12.0, but you have protobuf 4.25.2 which is incompatible. google-cloud-bigtable 1.7.3 requires protobuf<4.0.0dev, but you have protobuf 4.25.2 which is incompatible. google-cloud-pubsub 2.19.0 requires grpcio<2.0dev,>=1.51.3, but you have grpcio 1.51.1 which is incompatible. google-cloud-vision 2.8.0 requires protobuf<4.0.0dev,>=3.19.0, but you have protobuf 4.25.2 which is incompatible. jupyterlab 4.0.11 requires jupyter-lsp>=2.0.0, but you have jupyter-lsp 1.5.1 which is incompatible. jupyterlab-lsp 5.0.2 requires jupyter-lsp>=2.0.0, but you have jupyter-lsp 1.5.1 which is incompatible. kfp 2.5.0 requires google-cloud-storage<3,>=2.2.1, but you have google-cloud-storage 1.44.0 which is incompatible. kfp 2.5.0 requires protobuf<4,>=3.13.0, but you have protobuf 4.25.2 which is incompatible. kfp-pipeline-spec 0.2.2 requires protobuf<4,>=3.13.0, but you have protobuf 4.25.2 which is incompatible. libpysal 4.9.2 requires shapely>=2.0.1, but you have shapely 1.8.5.post1 which is incompatible. momepy 0.7.0 requires shapely>=2, but you have shapely 1.8.5.post1 which is incompatible. osmnx 1.8.1 requires shapely>=2.0, but you have shapely 1.8.5.post1 which is incompatible. pyldavis 3.4.1 requires pandas>=2.0.0, but you have pandas 1.5.3 which is incompatible. raft-dask 23.8.0 requires dask==2023.7.1, but you have dask 2023.11.0 which is incompatible. raft-dask 23.8.0 requires dask-cuda==23.8.*, but you have dask-cuda 23.12.0 which is incompatible. raft-dask 23.8.0 requires distributed==2023.7.1, but you have distributed 2023.11.0 which is incompatible. rmm 23.8.0 requires cuda-python<12.0a0,>=11.7.1, but you have cuda-python 12.3.0 which is incompatible. spopt 0.6.0 requires shapely>=2.0.1, but you have shapely 1.8.5.post1 which is incompatible. tensorboard 2.15.1 requires protobuf<4.24,>=3.19.6, but you have protobuf 4.25.2 which is incompatible. tensorflow-metadata 0.14.0 requires protobuf<4,>=3.7, but you have protobuf 4.25.2 which is incompatible. tensorflow-transform 0.14.0 requires protobuf<4,>=3.7, but you have protobuf 4.25.2 which is incompatible.


2.  Update de CUDA drivers to 12.3 (actual 12.2)
3.  Downgrade CUDA drivers to 11.8 ??? (better compatibility with other RAPIDS tools)
Hvnt3rK3ys commented 7 months ago

IN PREVIOS DOCKER ENVs [Pin to original environment (2023-12-13)] with CUDA 11.8, the dask framework with RAPIDS worked, I believe that the upgrade you did now in January its the cause... Actually the version is [Pin to original environment (2024-01-29)] with CUDA 12.2

djherbis commented 6 months ago

Filed https://b.corp.google.com/issues/328057594

Have you found any actual issues besides the warning when using cudf/cupy?