clMathLibraries / clFFT

a software library containing FFT functions written in OpenCL
Apache License 2.0
620 stars 192 forks source link

clFFT-client crash on embedded System #201

Closed CluelessDuck closed 7 years ago

CluelessDuck commented 7 years ago

When running clFFT-client on a embedded System (Phytec Phycore-RK3288) with Yocto Linux the program crashes when certain FFT lengths are used. The program crashed when a FFT Length of 1024, 2048 or 4096 was used. A length of 256, 512 worked without problem and with a FFT Length of 16384 the Execution wall time and Execution gflops were negative. When I used a real Input (--inLayout 5) and Hermitian_Interleaved Output ( --outLayout 3 ) the Error didn't occur.

My own implementation of an FFT with clFFT lead to the same Problem when using a Complex_Interleaved in- and output. When I used a real input my implementation did (mostly) work on the System. In my Implementation the Error occurs during the execution of clFFTBakePlan().

The FFT sizes which I wanted to use on the embedded System are exactly those which don’t work( 2048C and 4096C).

The clinfo output of my System is:

root@phycore-rk3288-3:~# clFFT-client --clinfo
OpenCL platform [ 0 ]:
    CL_PLATFORM_PROFILE:     FULL_PROFILE
    CL_PLATFORM_VERSION:     OpenCL 1.2 v1.r12p0-04rel0.c7cf4c8b39970391360f91824733eb1a
    CL_PLATFORM_NAME:        ARM Platform
    CL_PLATFORM_VENDOR:      ARM
    CL_PLATFORM_EXTENSIONS:  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

OpenCL platform [ 0 ], device [ 0 ]:
    CL_DEVICE_NAME:                      Mali-T760
    CL_DEVICE_VERSION:                   OpenCL 1.2 v1.r12p0-04rel0.c7cf4c8b39970391360f91824733eb1a
    CL_DRIVER_VERSION:                   1.2
    CL_DEVICE_TYPE:                      GPU
    CL_DEVICE_MAX_CLOCK_FREQUENCY:       400
    CL_DEVICE_ADDRESS_BITS:              64
    CL_DEVICE_AVAILABLE:                 TRUE
    CL_DEVICE_COMPILER_AVAILABLE:        TRUE
    CL_DEVICE_OPENCL_C_VERSION:          OpenCL C 1.2 v1.r12p0-04rel0.c7cf4c8b39970391360f91824733eb1a
    CL_DEVICE_MAX_WORK_GROUP_SIZE:       256
    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:  3
                         Dimension[ 0 ]  256
                         Dimension[ 1 ]  256
                         Dimension[ 2 ]  256
    CL_DEVICE_HOST_UNIFIED_MEMORY:       TRUE
    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:  65536 ( 64 KB )
    CL_DEVICE_LOCAL_MEM_SIZE:            32768 ( 32 KB )
    CL_DEVICE_GLOBAL_MEM_SIZE:           1055031296 ( 1006 MB )
    CL_DEVICE_MAX_MEM_ALLOC_SIZE:        263757824 ( 251 MB )
    CL_DEVICE_EXTENSIONS:                cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

The error message is:

root@phycore-rk3288-3:~# clFFT-client --platform 0 --device 0 -x 1024 -b 1 -p 100

                        BUILD LOG
************************************************
<source>:1592:19: error: initializing '__global float4 *' with an expression of incompatible type '__global float2 *'
        __global float4 *buff4g = bufOut;
                         ^        ~~~~~~

<source>:2027:19: error: initializing '__global float4 *' with an expression of incompatible type '__global float2 *'
        __global float4 *buff4g = bufOut;
                         ^        ~~~~~~

<source>:2040:49: warning: unknown attribute 'max_constant_size' ignored
void fft_fwd(__constant cb_t *cb __attribute__((max_constant_size(32))), __global float2 * restrict gb)
                                                ^

<source>:2066:50: warning: unknown attribute 'max_constant_size' ignored
void fft_back(__constant cb_t *cb __attribute__((max_constant_size(32))), __global float2 * restrict gb)
                                                 ^

error: Compiler frontend failed (error code 59)

************************************************
FFTGeneratedStockhamAction::compileKernels failed
OPENCL_V_THROWERROR< CLFFT_INVALID_PROGRAM > (507): clfftEnqueueTransform failed
clFFT error condition reported:
OPENCL_V_THROWERROR< CLFFT_INVALID_PROGRAM > (507): clfftEnqueueTransform failed
Warning:  Program terminating, but clFFT resources not freed.
Please consider explicitly calling clfftTeardown( ).

I am Sorry if this is not the place for this Bugreport or if it is an error on my side.

tingxingdong commented 7 years ago

there is a cast from float2 to float4. This cast is not used in every length. So some pass and some fail. It seems that ARM runtime does not support this cast.

b-sumner commented 7 years ago

I don't see a cast? The code seems to be simply trying to assign a float2 to a float4. It should complain about that without an explicit cast.

bragadeesh commented 7 years ago

this was not a problem with other opencl compilers that just raises a warning; looks like in your case it fails with error, i have checked in change with explicit cast in develop branch 0f8fe79 let me know if that fixes the issue for you

tingxingdong commented 7 years ago

Yes, this is not an explicit cast. We do it explicitly in rocFFT.

CluelessDuck commented 7 years ago

Thank you! This resolved the issue.