cupy / cupy

NumPy & SciPy for GPU
https://cupy.dev
MIT License
9.42k stars 846 forks source link

cufft call with callback producing CUFFT_INTERNAL_ERROR #8309

Closed psteinb closed 6 months ago

psteinb commented 6 months ago

Description

We've been struggling to get FFT transforms on 2D complex fields running. We would like to use CUFFT transforms with callbacks on Nvidia GPUs. We've been able to isolate the problem in a minimal reproducing unit test. See https://gist.github.com/psteinb/bc52a4820b1ed743d8dd8c4d24524b7c

If this suite is run, we get a multitude of CUFFT or CUDA Memory access errors. We've been staring at this for some hours now and would appreciate some feedback in case this is a cupy bug.

        with cp.fft.config.set_cufft_callbacks(cb_store = backwardCallback,
                                                       cb_store_aux_arr = kernel ):
            y = cp.fft.fft2(x)
            #cp.fft.fft2(cp.from_dlpack(x))

>       y_ = cp.fft.ifft2(y, norm = "forward")

tests/livereco_server/models/test_fresnel_propagator_backward.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/p/software/jurecadc/stages/2024/software/CuPy/12.2.0-gcccoreflexiblas-12.3.0-3.3.1-CUDA-12/lib/python3.11/site-packages/cupy/fft/_fft.py:751: in ifft2
    return func(a, s, axes, norm, cufft.CUFFT_INVERSE)
/p/software/jurecadc/stages/2024/software/CuPy/12.2.0-gcccoreflexiblas-12.3.0-3.3.1-CUDA-12/lib/python3.11/site-packages/cupy/fft/_fft.py:617: in _fftn
    a = _exec_fftn(a, direction, value_type, norm=norm, axes=axes_sorted,
/p/software/jurecadc/stages/2024/software/CuPy/12.2.0-gcccoreflexiblas-12.3.0-3.3.1-CUDA-12/lib/python3.11/site-packages/cupy/fft/_fft.py:517: in _exec_fftn
    plan = _get_cufft_plan_nd(a.shape, fft_type, axes=axes, order=order,
/p/software/jurecadc/stages/2024/software/CuPy/12.2.0-gcccoreflexiblas-12.3.0-3.3.1-CUDA-12/lib/python3.11/site-packages/cupy/fft/_fft.py:459: in _get_cufft_plan_nd
    plan = cufft.PlanNd(*keys)
cupy/cuda/cufft.pyx:800: in cupy.cuda.cufft.PlanNd.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   cupy.cuda.cufft.CuFFTError: CUFFT_INTERNAL_ERROR

cupy/cuda/cufft.pyx:169: CuFFTError
========================================================================== short test summary info ==========================================================================
FAILED tests/livereco_server/models/test_fresnel_propagator_backward.py::test_backward_double_callback_cpfft - cupy.cuda.cufft.CuFFTError: CUFFT_INTERNAL_ERROR
====================================================================== 1 failed, 3 deselected in 4.96s ======================================================================

To Reproduce

After cupy is setup, install pytest and execute the following call to run the tests:

python -m pytest ./test_fresnel_propagator_backward.py -k 'cpfft'

Installation

Source (pip install cupy)

Environment

>>> cp.show_config()
OS                           : Linux-4.18.0-513.18.1.el8_9.x86_64-x86_64-with-glibc2.28
Python Version               : 3.11.3
CuPy Version                 : 12.2.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.25.1
SciPy Version                : 1.11.1
Cython Build Version         : 0.29.35
Cython Runtime Version       : 0.29.35
CUDA Root                    : /p/software/jurecadc/stages/2024/software/CUDA/12
nvcc PATH                    : /p/software/jurecadc/stages/2024/software/CUDA/12/bin/nvcc
CUDA Build Version           : 12020
CUDA Driver Version          : 12020
CUDA Runtime Version         : 12020
cuBLAS Version               : (available)
cuFFT Version                : 11008
cuRAND Version               : 10303
cuSOLVER Version             : (11, 5, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (12, 2)
Thrust Version               : 200101
CUB Build Version            : 200101
Jitify Build Version         : <unknown>
cuDNN Build Version          : 8905
cuDNN Version                : 8905
NCCL Build Version           : 21803
NCCL Runtime Version         : 21803
cuTENSOR Version             : 10700
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA A100-SXM4-40GB
Device 0 Compute Capability  : 80
Device 0 PCI Bus ID          : 0000:03:00.0

The error also occurs with:

>>> cp.show_config()
OS                           : Linux-4.18.0-513.18.1.el8_9.x86_64-x86_64-with-glibc2.28
Python Version               : 3.11.3
CuPy Version                 : 13.1.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.26.4
SciPy Version                : 1.13.0
Cython Build Version         : 0.29.37
Cython Runtime Version       : 3.0.10
CUDA Root                    : /p/software/jurecadc/stages/2024/software/CUDA/12
nvcc PATH                    : /p/software/jurecadc/stages/2024/software/CUDA/12/bin/nvcc
CUDA Build Version           : 12020
CUDA Driver Version          : 12020
CUDA Runtime Version         : 12020 (linked to CuPy) / 12020 (locally installed)
cuBLAS Version               : (available)
cuFFT Version                : 11008
cuRAND Version               : 10303
cuSOLVER Version             : (11, 5, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (12, 2)
Thrust Version               : 200200
CUB Build Version            : 200200
Jitify Build Version         : <unknown>
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA A100-SXM4-40GB
Device 0 Compute Capability  : 80
Device 0 PCI Bus ID          : 0000:03:00.0

Additional Information

No response

leofang commented 6 months ago

@psteinb I can reproduce the error. The problem is your callback was written for single-complex, but you allocated data as double-complex (cp.cfloat). Changing it to cp.complex64 fixes the problem.

psteinb commented 6 months ago

That does it! Thanks so much. The tests runs through alright now without errors. We are running into other problems with CUFFT now which we need to tend to.