ROCm / rocFFT

Next generation FFT implementation for ROCm
https://rocm.docs.amd.com/projects/rocFFT/en/latest/
Other
173 stars 84 forks source link

Batched 3D FFT memory access fault #311

Closed upsj closed 3 years ago

upsj commented 3 years ago

I am working on an hipFFT wrapper for an HPC library, where I basically hipified our cuFFT wrapper code 1:1 (only replacing 64 bit calls by 32 bit calls). The CUDA and HIP-CUDA tests work and give correct results, only when compiling everything with rocFFT on an AMD device do I get a memory access fault. My guess would be that this is related to the "interleaved batch" memory layout we are using, where the innermost dimension is the batch dimension, followed by the 3 FFT dimensions.

Let me know if you need any additional information

What is the expected behavior

What actually happens

How to reproduce

int main() { hipfftDoubleComplex in; hipfftDoubleComplex out; hipfftHandle handle; int sizes[] = {16, 32, 64}; int size = sizes[0] sizes[1] sizes[2]; int batch = 2; size_t worksize; hipMalloc(&in, sizeof(hipfftDoubleComplex) size batch); hipMalloc(&out, sizeof(hipfftDoubleComplex) size batch); hipfftCreate(&handle); hipfftMakePlanMany(handle, 3, sizes, sizes, batch, 1, sizes, batch, 1, HIPFFT_Z2Z, batch, &worksize); hipfftExecZ2Z(handle, in, out, HIPFFT_FORWARD); hipDeviceSynchronize(); hipFree(out); hipFree(in);

}



### Environment
| Hardware | description |
|-----|-----|
| GPU | Radeon VII |
| CPU | AMD Ryzen Threadripper 1920X |

| Software | version |
|-----|-----|
| HIP | 4.0.20496-4f163c68 |
| hipFFT | 1.0.2.57-be3a15d | 
| rocFFT | 1.0.8.966-rocm-rel-4.0-23-2d35fd6 |
| hip-clang | dac2bfceaa8d4a90257dc8a6d58f268e172ce00e |
evetsso commented 3 years ago

Thanks for the bug report. It looks like a fix for this might already be coming in the next release but I'll confirm.

evetsso commented 3 years ago

@upsj After looking closer at your test program, it looks like you've got some errors in it:

This works:

#include "hipfft.h"

int main() {
    hipfftDoubleComplex* in;
    hipfftDoubleComplex* out;
    hipfftHandle handle;
    int sizes[] = {16, 32, 64};
    int size = sizes[0] * sizes[1] * sizes[2];
    int batch = 2;
    size_t worksize;
    hipMalloc(&in, sizeof(hipfftDoubleComplex) * size * batch);
    hipMalloc(&out, sizeof(hipfftDoubleComplex) * size * batch);
    hipfftCreate(&handle);
    hipfftMakePlanMany(handle, 3, sizes, sizes, 1, size, sizes, 1, size, HIPFFT_Z2Z, batch, &worksize);
    hipfftExecZ2Z(handle, in, out, HIPFFT_FORWARD);
    hipDeviceSynchronize();
    hipFree(out);
    hipFree(in);
}

I'm closing this issue - please feel free to comment if you have any questions. We can open this issue or another issue if you run into additional problems.

upsj commented 3 years ago

I think you slightly misunderstood my use case - the interleaved, non-contiguous storage is intended, since due to interface consideration, we store the FFT for each batch as a column in a row-major matrix. Formally, with dimensions (n,m,k) and batch count c, the index of the entry (x,y,z) in batch b is x*s2*s3*c + y*s3*c + z*c + b. The example I posted is minimized, we encounter the same issue in practice, and the identical invocation with cuFFT works.

evetsso commented 3 years ago

Ok, I see. I don't have an immediate solution to your problem but will investigate.

upsj commented 3 years ago

That's great to hear, thanks! Just let me know when you have a solution, I will disable the offending tests until then.

evetsso commented 3 years ago

f9006e40dd4cbdcf3b0ec220b22bb78e5a8ab033 fixes this in the develop branch. It should be included in the next release. Please comment/reopen if you still see problems.