Closed upsj closed 3 years ago
Thanks for the bug report. It looks like a fix for this might already be coming in the next release but I'll confirm.
@upsj After looking closer at your test program, it looks like you've got some errors in it:
istride
and ostride
must be 1 for contiguous data. Note that if your data is contiguous, you can pass null pointers for inembed
and onembed
and hipFFT will choose equivalent defaults.idist
and odist
must be the number of elements between batches. For contiguous batches, that's the same as the int size
variable in your code.This works:
#include "hipfft.h"
int main() {
hipfftDoubleComplex* in;
hipfftDoubleComplex* out;
hipfftHandle handle;
int sizes[] = {16, 32, 64};
int size = sizes[0] * sizes[1] * sizes[2];
int batch = 2;
size_t worksize;
hipMalloc(&in, sizeof(hipfftDoubleComplex) * size * batch);
hipMalloc(&out, sizeof(hipfftDoubleComplex) * size * batch);
hipfftCreate(&handle);
hipfftMakePlanMany(handle, 3, sizes, sizes, 1, size, sizes, 1, size, HIPFFT_Z2Z, batch, &worksize);
hipfftExecZ2Z(handle, in, out, HIPFFT_FORWARD);
hipDeviceSynchronize();
hipFree(out);
hipFree(in);
}
I'm closing this issue - please feel free to comment if you have any questions. We can open this issue or another issue if you run into additional problems.
I think you slightly misunderstood my use case - the interleaved, non-contiguous storage is intended, since due to interface consideration, we store the FFT for each batch as a column in a row-major matrix.
Formally, with dimensions (n,m,k) and batch count c, the index of the entry (x,y,z) in batch b is x*s2*s3*c + y*s3*c + z*c + b
.
The example I posted is minimized, we encounter the same issue in practice, and the identical invocation with cuFFT works.
Ok, I see. I don't have an immediate solution to your problem but will investigate.
That's great to hear, thanks! Just let me know when you have a solution, I will disable the offending tests until then.
f9006e40dd4cbdcf3b0ec220b22bb78e5a8ab033 fixes this in the develop branch. It should be included in the next release. Please comment/reopen if you still see problems.
I am working on an hipFFT wrapper for an HPC library, where I basically hipified our cuFFT wrapper code 1:1 (only replacing 64 bit calls by 32 bit calls). The CUDA and HIP-CUDA tests work and give correct results, only when compiling everything with rocFFT on an AMD device do I get a memory access fault. My guess would be that this is related to the "interleaved batch" memory layout we are using, where the innermost dimension is the batch dimension, followed by the 3 FFT dimensions.
Let me know if you need any additional information
What is the expected behavior
What actually happens
Memory access fault by GPU node-1 (Agent handle: ...) on address .... Reason: Page not present or supervisor privilege.
How to reproduce
batch
to 1 works correctly)int main() { hipfftDoubleComplex in; hipfftDoubleComplex out; hipfftHandle handle; int sizes[] = {16, 32, 64}; int size = sizes[0] sizes[1] sizes[2]; int batch = 2; size_t worksize; hipMalloc(&in, sizeof(hipfftDoubleComplex) size batch); hipMalloc(&out, sizeof(hipfftDoubleComplex) size batch); hipfftCreate(&handle); hipfftMakePlanMany(handle, 3, sizes, sizes, batch, 1, sizes, batch, 1, HIPFFT_Z2Z, batch, &worksize); hipfftExecZ2Z(handle, in, out, HIPFFT_FORWARD); hipDeviceSynchronize(); hipFree(out); hipFree(in);
}