conda-forge / cupy-feedstock

A conda-smithy repository for cupy.
BSD 3-Clause "New" or "Revised" License
5 stars 23 forks source link

cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed #124

Closed sandeepnmenon closed 3 years ago

sandeepnmenon commented 3 years ago

Issue: cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed I installed cupy using the conda-forge channel conda install -c conda-forge cupy

The error occurs in the Module.load function

line 22, in get_kernel_func
    module.load(bytes(ptx.encode()))
  File "cupy/cuda/function.pyx", line 241, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 243, in cupy.cuda.function.Module.load
  File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed

Reproducible script

import torch
try:
    import cupy.cuda
    from pynvrtc.compiler import Program
except:
    pass
from collections import namedtuple
import numpy as np

modules = {}

def get_kernel_func(kname, ksrc, dtype):
    if kname+dtype not in modules:
        ksrc = ksrc.replace('DTYPE', dtype)
        #prog = Program(ksrc.encode('utf-8'), (kname+dtype+'.cu').encode('utf-8'))
        #uncomment the line above and comment the line below if it causes the following error: AttributeError: 'Program' object has no attribute '_program'
        prog = Program(ksrc, kname+dtype+'.cu')        
        ptx = prog.compile()
        log = prog._interface.nvrtcGetProgramLog(prog._program)
        if len(log.strip()) > 0: print(log)
        module = cupy.cuda.function.Module()
        module.load(bytes(ptx.encode()))
        modules[kname+dtype] = module
    else:
        module = modules[kname+dtype]

    Stream = namedtuple('Stream', ['ptr'])
    s = Stream(ptr=torch.cuda.current_stream().cuda_stream)        

    return module.get_function(kname), s

def conv_aggregate_fw_kernel_v2(**kwargs):
    kernel = r'''
extern "C"
__global__ void conv_aggregate_fw_kernel_v2(DTYPE* dest, const DTYPE* src, const long long* lengths, const long long* cslengths, int width, int N, int dest_stridex, int src_stridex, int blockDimy) {

    int x = blockIdx.x * blockDim.x + threadIdx.x; //one thread per feature channel, runs over all nodes
    if (x >= width) return;

    int i = blockIdx.y * blockDimy;
    int imax = min(N, i + blockDimy);
    dest += dest_stridex * i + x;
    src += src_stridex * (cslengths[i] - lengths[i]) + x;

    for (; i<imax; ++i) {   
        int len = lengths[i];
        if (len > 0) {
            DTYPE sum = 0;      
            for (int j=0; j<len; j++, src += src_stridex) {
                sum += *src;
            }

            *dest = sum / len;          
        }
        else {
            *dest = 0;
        }

        dest += dest_stridex;
    }
}
'''
    return kernel   

def get_dtype(t):
    if isinstance(t, torch.cuda.FloatTensor):
        return 'float'
    elif isinstance(t, torch.cuda.DoubleTensor):
        return 'double'

starte = 0
nume=1
idxn = torch.from_numpy(np.random.permutation(10))
input = torch.from_numpy(np.random.permutation(10))
src = torch.index_select(input, 0, idxn.narrow(0,starte,nume)).type(torch.cuda.FloatTensor)

function, stream = get_kernel_func('conv_aggregate_fw_kernel_v2', conv_aggregate_fw_kernel_v2(), get_dtype(src))


Environment (conda list):

``` $ conda list _anaconda_depends 2020.07 py38_0 _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge alabaster 0.7.12 py_0 anaconda custom py38_1 anaconda-client 1.7.2 py38_0 anaconda-project 0.8.4 py_0 argh 0.26.2 py38_0 argon2-cffi 20.1.0 py38h7b6447c_1 ase 3.21.1 pypi_0 pypi asn1crypto 1.4.0 py_0 astroid 2.4.2 py38_0 astropy 4.0.2 py38h7b6447c_0 async_generator 1.10 py_0 atomicwrites 1.4.0 py_0 attrs 20.3.0 pyhd3eb1b0_0 autopep8 1.5.4 py_0 babel 2.8.1 pyhd3eb1b0_0 backcall 0.2.0 py_0 backports 1.0 py_2 backports.shutil_get_terminal_size 1.0.0 py38_2 beautifulsoup4 4.9.3 pyhb0f4dca_0 bitarray 1.6.1 py38h27cfd23_0 bkcharts 0.2 py38_0 blas 1.0 mkl bleach 3.2.1 py_0 blosc 1.20.1 hd408876_0 bokeh 2.2.3 py38_0 boost 1.73.0 py38_11 anaconda boto 2.49.0 py38_0 bottleneck 1.3.2 py38heb32a55_1 brotlipy 0.7.0 py38h7b6447c_1000 bzip2 1.0.8 h7b6447c_0 ca-certificates 2020.10.14 0 anaconda cairo 1.14.12 h8948797_3 certifi 2020.6.20 py38_0 anaconda cffi 1.14.3 py38he30daa8_0 chardet 3.0.4 py38_1003 click 7.1.2 py_0 cloudpickle 1.6.0 py_0 clyent 1.2.2 py38_1 colorama 0.4.4 py_0 contextlib2 0.6.0.post1 py_0 cryptography 3.1.1 py38h1ba5d50_0 cudatoolkit 11.1.1 h6406543_8 conda-forge cudnn 8.1.0.77 h90431f1_0 conda-forge cupy 8.6.0 py38h5546af9_0 conda-forge curl 7.71.1 hbc83047_1 cutensor 1.2.2.5 h96e36e3_3 conda-forge cycler 0.10.0 py38_0 cython 0.29.21 py38he6710b0_0 cytoolz 0.11.0 py38h7b6447c_0 dask 2.30.0 py_0 dask-core 2.30.0 py_0 dbus 1.13.18 hb2f20db_0 decorator 4.4.2 py_0 defusedxml 0.6.0 py_0 diff-match-patch 20200713 py_0 distributed 2.30.1 py38h06a4308_0 docutils 0.16 py38_1 eigen 3.3.7 hfd86e86_0 eigen3 3.3.7 0 omnia entrypoints 0.3 py38_0 et_xmlfile 1.0.1 py_1001 expat 2.2.10 he6710b0_2 fastcache 1.1.0 py38h7b6447c_0 fastrlock 0.6 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.0.12 py_0 flake8 3.8.4 py_0 flask 1.1.2 py_0 fontconfig 2.13.0 h9420a91_0 freetype 2.10.4 h5ab3b9f_0 fribidi 1.0.10 h7b6447c_0 fsspec 0.8.3 py_0 future 0.18.2 py38_1 get_terminal_size 1.0.0 haa9412d_0 gevent 20.9.0 py38h7b6447c_0 glib 2.66.1 h92f7085_0 glob2 0.7 py_0 gmp 6.1.2 h6c8ec71_1 gmpy2 2.0.8 py38hd5f6e3b_3 gnutls 3.6.13 h85f3911_1 conda-forge googledrivedownloader 0.4 pypi_0 pypi graphite2 1.3.14 h23475e2_0 greenlet 0.4.17 py38h7b6447c_0 gst-plugins-base 1.14.0 hbbd80ab_1 gstreamer 1.14.0 hb31296c_0 h5py 2.10.0 py38h7918eee_0 harfbuzz 2.4.0 hca77d97_1 hdf5 1.10.4 hb1b8bf9_0 heapdict 1.0.1 py_0 html5lib 1.1 py_0 icu 58.2 he6710b0_3 idna 2.10 py_0 imageio 2.9.0 py_0 imagesize 1.2.0 py_0 importlib-metadata 2.0.0 py_1 importlib_metadata 2.0.0 1 iniconfig 1.1.1 py_0 intel-openmp 2020.2 254 intervaltree 3.1.0 py_0 ipykernel 5.3.4 py38h5ca1d4c_0 ipython 7.19.0 py38hb070fc8_0 ipython_genutils 0.2.0 py38_0 ipywidgets 7.5.1 py_1 isodate 0.6.0 pypi_0 pypi isort 5.6.4 py_0 itsdangerous 1.1.0 py_0 jbig 2.1 hdba287a_0 jdcal 1.4.1 py_0 jedi 0.17.1 py38_0 jeepney 0.5.0 pyhd3eb1b0_0 jinja2 2.11.2 py_0 joblib 0.17.0 py_0 jpeg 9b h024ee3a_2 json5 0.9.5 py_0 jsonpatch 1.32 pypi_0 pypi jsonpointer 2.1 pypi_0 pypi jsonschema 3.2.0 py_2 jupyter 1.0.0 py38_7 jupyter_client 6.1.7 py_0 jupyter_console 6.2.0 py_0 jupyter_core 4.6.3 py38_0 jupyterlab 2.2.6 py_0 jupyterlab_pygments 0.1.2 py_0 jupyterlab_server 1.2.0 py_0 keyring 21.4.0 py38_1 kiwisolver 1.3.0 py38h2531618_0 krb5 1.18.2 h173b8e3_0 lame 3.100 h7f98852_1001 conda-forge lazy-object-proxy 1.4.3 py38h7b6447c_0 lcms2 2.11 h396b838_0 ld_impl_linux-64 2.33.1 h53a641e_7 libarchive 3.4.2 h62408e4_0 libboost 1.73.0 h37e3b65_11 anaconda libcurl 7.71.1 h20c2e04_1 libedit 3.1.20191231 h14c3975_1 libffi 3.3 he6710b0_2 libgcc-ng 9.3.0 h2828fa1_19 conda-forge libgfortran-ng 7.3.0 hdf63c60_0 libgomp 9.3.0 h2828fa1_19 conda-forge libiconv 1.16 h516909a_0 conda-forge liblief 0.10.1 he6710b0_0 libllvm10 10.0.1 hbcb73fb_5 libllvm9 9.0.1 hf817b99_2 conda-forge libpng 1.6.37 hbc83047_0 libsodium 1.0.18 h7b6447c_0 libspatialindex 1.9.3 he6710b0_0 libssh2 1.9.0 h1ba5d50_1 libstdcxx-ng 9.3.0 h6de172a_19 conda-forge libtiff 4.1.0 h2733197_1 libtool 2.4.6 h7b6447c_1005 libuuid 1.0.3 h1bed415_2 libuv 1.41.0 h7f98852_0 conda-forge libxcb 1.14 h7b6447c_0 libxml2 2.9.10 hb55368b_3 libxslt 1.1.34 hc22bd24_0 llvmlite 0.34.0 py38h269e1b5_4 locket 0.2.0 py38_1 lxml 4.6.1 py38hefd8a0e_0 lz4-c 1.9.2 heb0550a_3 lzo 2.10 h7b6447c_2 markupsafe 1.1.1 py38h7b6447c_0 matplotlib 3.3.2 0 matplotlib-base 3.3.2 py38h817c723_0 mccabe 0.6.1 py38_1 mistune 0.8.4 py38h7b6447c_1000 mkl 2020.2 256 mkl-service 2.3.0 py38he904b0f_0 mkl_fft 1.2.0 py38h23d657b_0 mkl_random 1.1.1 py38h0573a6f_0 mock 4.0.2 py_0 more-itertools 8.6.0 pyhd3eb1b0_0 mpc 1.1.0 h10f8cd9_1 mpfr 4.0.2 hb69a4c5_1 mpmath 1.1.0 py38_0 msgpack-python 1.0.0 py38hfd86e86_1 multipledispatch 0.6.0 py38_0 nbclient 0.5.1 py_0 nbconvert 6.0.7 py38_0 nbformat 5.0.8 py_0 nccl 2.9.6.1 h97a9cb7_0 conda-forge ncurses 6.2 he6710b0_1 nest-asyncio 1.4.2 pyhd3eb1b0_0 nettle 3.6 he412f7d_0 conda-forge networkx 2.5 py_0 ninja 1.10.2 h4bd325d_0 conda-forge nltk 3.5 py_0 nose 1.3.7 py38_2 notebook 6.1.4 py38_0 numba 0.51.2 py38h0573a6f_1 numexpr 2.7.1 py38h423224d_0 numpy 1.19.2 py38h54aff64_0 numpy-base 1.19.2 py38hfa32c7d_0 numpydoc 1.1.0 pyhd3eb1b0_1 olefile 0.46 py_0 openh264 2.1.1 h780b84a_0 conda-forge openpyxl 3.0.5 py_0 openssl 1.1.1h h7b6447c_0 anaconda packaging 20.4 py_0 pandas 1.1.3 py38he6710b0_0 pandoc 2.11 hb0f4dca_0 pandocfilters 1.4.3 py38h06a4308_1 pango 1.45.3 hd140c19_0 parso 0.7.0 py_0 partd 1.1.0 py_0 patchelf 0.12 he6710b0_0 path 15.0.0 py38_0 path.py 12.5.0 0 pathlib2 2.3.5 py38_0 pathtools 0.1.2 py_1 patsy 0.5.1 py38_0 pcre 8.44 he6710b0_0 pep8 1.7.1 py38_0 pexpect 4.8.0 py38_0 pickleshare 0.7.5 py38_1000 pillow 8.0.1 py38he98fc37_0 pip 20.2.4 py38h06a4308_0 pixman 0.40.0 h7b6447c_0 pkginfo 1.6.1 py38h06a4308_0 pluggy 0.13.1 py38_0 ply 3.11 py38_0 plyfile 0.7.3 pypi_0 pypi prometheus_client 0.8.0 py_0 prompt-toolkit 3.0.8 py_0 prompt_toolkit 3.0.8 0 psutil 5.7.2 py38h7b6447c_0 ptyprocess 0.6.0 py38_0 py 1.9.0 py_0 py-boost 1.73.0 py38h962f231_11 anaconda py-lief 0.10.1 py38h403a769_0 pycodestyle 2.6.0 py_0 pycosat 0.6.3 py38h7b6447c_1 pycparser 2.20 py_2 pycurl 7.43.0.6 py38h1ba5d50_0 pydocstyle 5.1.1 py_0 pyflakes 2.2.0 py_0 pygments 2.7.2 pyhd3eb1b0_0 pylint 2.6.0 py38_0 pynvrtc 9.2 pypi_0 pypi pyodbc 4.0.30 py38he6710b0_0 pyopenssl 19.1.0 py_1 pyparsing 2.4.7 py_0 pyqt 5.9.2 py38h05f1152_4 pyrsistent 0.17.3 py38h7b6447c_0 pysocks 1.7.1 py38_0 pytables 3.6.1 py38h9fd0a39_0 pytest 6.1.1 py38_0 python 3.8.5 h7579374_1 python-dateutil 2.8.1 py_0 python-igraph 0.9.1 pypi_0 pypi python-jsonrpc-server 0.4.0 py_0 python-language-server 0.35.1 py_0 python-libarchive-c 2.9 py_0 python-louvain 0.15 pypi_0 pypi python_abi 3.8 1_cp38 conda-forge pytorch 1.8.1 py3.8_cuda11.1_cudnn8.0.5_0 pytorch pytz 2020.1 py_0 pywavelets 1.1.1 py38h7b6447c_2 pyxdg 0.27 pyhd3eb1b0_0 pyyaml 5.3.1 py38h7b6447c_1 pyzmq 19.0.2 py38he6710b0_1 qdarkstyle 2.8.1 py_0 qt 5.9.7 h5867ecd_1 qtawesome 1.0.1 py_0 qtconsole 4.7.7 py_0 qtpy 1.9.0 py_0 rdflib 5.0.0 pypi_0 pypi readline 8.0 h7b6447c_0 regex 2020.10.15 py38h7b6447c_0 requests 2.24.0 py_0 ripgrep 12.1.1 0 rope 0.18.0 py_0 rtree 0.9.4 py38_1 ruamel_yaml 0.15.87 py38h7b6447c_1 scikit-image 0.17.2 py38hdf5156a_0 scikit-learn 0.23.2 py38h0573a6f_0 scipy 1.5.2 py38h0b6359f_0 seaborn 0.11.0 py_0 secretstorage 3.1.2 py38_0 send2trash 1.5.0 py38_0 setuptools 50.3.1 py38h06a4308_1 simplegeneric 0.8.1 py38_2 singledispatch 3.4.0.3 py_1001 sip 4.19.13 py38he6710b0_0 six 1.15.0 py38h06a4308_0 sklearn 0.0 pypi_0 pypi snappy 1.1.8 he1b5a44_3 conda-forge snowballstemmer 2.0.0 py_0 sortedcollections 1.2.1 py_0 sortedcontainers 2.2.2 py_0 soupsieve 2.0.1 py_0 sphinx 3.2.1 py_0 sphinxcontrib 1.0 py38_1 sphinxcontrib-applehelp 1.0.2 py_0 sphinxcontrib-devhelp 1.0.2 py_0 sphinxcontrib-htmlhelp 1.0.3 py_0 sphinxcontrib-jsmath 1.0.1 py_0 sphinxcontrib-qthelp 1.0.3 py_0 sphinxcontrib-serializinghtml 1.1.4 py_0 sphinxcontrib-websupport 1.2.4 py_0 spyder 4.1.5 py38_0 spyder-kernels 1.9.4 py38_0 sqlalchemy 1.3.20 py38h7b6447c_0 sqlite 3.33.0 h62c20be_0 statsmodels 0.12.0 py38h7b6447c_0 sympy 1.6.2 py38h06a4308_1 tbb 2020.3 hfd86e86_0 tblib 1.7.0 py_0 terminado 0.9.1 py38_0 testpath 0.4.4 py_0 texttable 1.6.3 pypi_0 pypi threadpoolctl 2.1.0 pyh5ca1d4c_0 tifffile 2020.10.1 py38hdd07704_2 tk 8.6.10 hbc83047_0 toml 0.10.1 py_0 toolz 0.11.1 py_0 torch-cluster 1.5.9 pypi_0 pypi torch-geometric 1.7.0 pypi_0 pypi torch-scatter 2.0.6 pypi_0 pypi torch-sparse 0.6.9 pypi_0 pypi torch-spline-conv 1.2.1 pypi_0 pypi torchaudio 0.8.1 py38 pytorch torchfile 0.1.0 pypi_0 pypi torchnet 0.0.5.1 pypi_0 pypi torchvision 0.9.1 py38_cu111 pytorch tornado 6.0.4 py38h7b6447c_1 tqdm 4.50.2 py_0 traitlets 5.0.5 py_0 transforms3d 0.3.1 pypi_0 pypi typing_extensions 3.7.4.3 py_0 ujson 4.0.1 py38he6710b0_0 unicodecsv 0.14.1 py38_0 unixodbc 2.3.9 h7b6447c_0 urllib3 1.25.11 py_0 visdom 0.1.8.9 pypi_0 pypi watchdog 0.10.3 py38_0 wcwidth 0.2.5 py_0 webencodings 0.5.1 py38_1 websocket-client 0.58.0 pypi_0 pypi werkzeug 1.0.1 py_0 wheel 0.35.1 py_0 widgetsnbextension 3.5.1 py38_0 wrapt 1.11.2 py38h7b6447c_0 wurlitzer 2.0.1 py38_0 xlrd 1.2.0 py_0 xlsxwriter 1.3.7 py_0 xlwt 1.3.0 py38_0 xz 5.2.5 h7b6447c_0 yaml 0.2.5 h7b6447c_0 yapf 0.30.0 py_0 zeromq 4.3.3 he6710b0_3 zict 2.0.0 py_0 zipp 3.4.0 pyhd3eb1b0_0 zlib 1.2.11 h7b6447c_3 zope 1.0 py38_1 zope.event 4.5.0 py38_0 zope.interface 5.1.2 py38h7b6447c_0 zstd 1.4.5 h9ceee32_0 ```


Details about conda and system ( conda info ):

``` $ conda info active environment : superpoint active env location : /home/anaconda3/envs/superpoint shell level : 2 user config file : /home/.condarc populated config files : conda version : 4.9.2 conda-build version : 3.18.11 python version : 3.8.3.final.0 virtual packages : __cuda=11.0=0 __glibc=2.27=0 __unix=0=0 __archspec=1=x86_64 base environment : /home/anaconda3 (writable) channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/anaconda3/pkgs /home/.conda/pkgs envs directories : /home/anaconda3/envs /home/.conda/envs platform : linux-64 user-agent : conda/4.9.2 requests/2.24.0 CPython/3.8.3 Linux/5.4.0-72-generic ubuntu/18.04.5 glibc/2.27 UID:GID : 1000:1000 netrc file : None offline mode : False ```
leofang commented 3 years ago

Try downgrading your cudatoolkit to 11.0. I think your driver version mismatches with cudatoolkit's.

leofang commented 3 years ago

btw as an aside: your get_kernel_func looks a bit nasty 😄 Maybe you would like to consider using cupy.RawModule which does the same thing for you (but is a lot cleaner)?

(cupy.cuda.function.Module() is internal API and we don't guarantee it's stable across versions.)

sandeepnmenon commented 3 years ago

@leofang Thank you. matching the toolkit version with the driver worked. Also thank you for the suggestion. I tried it out and it works the same. Will make the change

leofang commented 3 years ago

Glad to know, @sandeepnmenon!