GPUOpen-ProfessionalCompute-Libraries / amdovx-modules

AMD OpenVX modules: such as, neural network inference, 360 video stitching, etc.
110 stars 50 forks source link

ERROR: clEnqueueNDRangeKernel(supernode,1,*,{1167360,0,0},...) failed(-54) for com.amd.loomsl.half_scale_gaussian #14

Closed arpu closed 7 years ago

arpu commented 7 years ago

get this on fedora 24 with intel GPU (for first testing)

loom_shell 0.9.6 [loomsl 0.9.6] ... processing commands from test ..camera_params cam1_par declared ..camera_params cam2_par declared ..camera_params cam3_par declared ..ls_context context[1] created ..lsCreateContext: created context context[0] ..lsSetOutputConfig: successful for context[0] ..lsSetCameraConfig: successful for context[0] ..lsSetCameraParams: successful for context[0] and camera#0 ..lsSetCameraParams: successful for context[0] and camera#1 ..lsSetCameraParams: successful for context[0] and camera#2 WARNING: AllocateInternalTablesForCamera: ExpComp has been disabled because of not enough overlap WARNING: AllocateInternalTablesForCamera: SeamFind has been disabled because of not enough overlap X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! OK: OpenVX using GPU device#0 (Intel(R) HD Graphics Skylake Desktop GT2) [OpenCL 1.2 beignet 1.3] [1] ..lsInitialize: successful for context[0] (18153.606 ms) ..cl_context opencl_context[1] created ..cl_mem buf[2] created ..lsGetOpenCLContext: get OpenCL context opencl_context[0] from context[0] OK: loaded CAM00.bmp OK: loaded CAM01.bmp OK: loaded CAM02.bmp ..lsSetCameraBuffer: set OpenCL buffer buf[0] for context[0] ..lsSetOutputBuffer: set OpenCL buffer buf[1] for context[0] ERROR: clEnqueueNDRangeKernel(supernode,1,*,{1167360,0,0},...) failed(-54) for com.amd.loomsl.half_scale_gaussian ERROR: OpenVX call failed with status = (-1) at /home/arpu/Work/githubsources/amdovx-modules/vx_loomsl/live_stitch_api.cpp#2666 ERROR: lsScheduleFrame() failed (-1) @iter:0 ... exit from test

what is the best way to debug this?

arpu commented 7 years ago

looks like this is a core problem because a simple runvx get the same error runvx canny.gdf  255 ↵  8881  17:05:01 runvx 0.9.6 OK: using AMD OpenVX 0.9.6 X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! X server found. dri2 connection failed! OK: OpenVX using GPU device#0 (Intel(R) HD Graphics Skylake Desktop GT2) [OpenCL 1.2 beignet 1.3] [1] csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms OK: capturing 640x480 image(s) into 640x480 RGB image buffer ERROR: clEnqueueNDRangeKernel(supernode,2,*,80x480,...) failed(-54) for group#1 ERROR: vxScheduleGraph() failed (-1:VX_FAILURE) OK: OpenCL buffer usage: 3384720, 2/5

arpu commented 7 years ago

runvx -v -affinity:CPU canny.gdf works fine (without opencl)

rgiduthuri commented 7 years ago

From the logs, looks like you're using non-AMD GPU. Did you try running it on AMD GPU? @lcskrishna made contributions earlier to emulate AMD GPU specific operations using standard OpenCL primitives. You might be encountering other limitations not related to AMD specific extensions. @lcskrishna, any thoughts?

arpu commented 7 years ago

yes exactly use the standard intel buildin GPU for development no not tested on an AMD GPU

lcskrishna commented 7 years ago

Hello, From the logs, it looks like there is some issue with the Workgroup dimensions given.

arpu commented 7 years ago

anything i can do/debug to fix this? CL_INVALID_WORK_GROUP_SIZE (-54)

arpu commented 7 years ago

i have done some testing with opencv int dbsize = cv::ocl::Device::getDefault().maxComputeUnits(); size_t wgs = cv::ocl::Device::getDefault().maxWorkGroupSize(); cout << dbsize << endl; cout << wgs << endl; for the name : Intel(R) HD Graphics Skylake Desktop GT2 available : 1 imageSupport : 1 OpenCL_C_Version : OpenCL C 1.2 beignet 1.3 8 ... maxComputeUnits(); 1024 ... maxWorkGroupSize();

arpu commented 7 years ago

today i buyed a new AMD Polaris gfx but now i get this error :/ runvx 0.9.6 OK: using AMD OpenVX 0.9.6 OK: OpenVX using GPU device#0 (Baffin) [OpenCL 1.2 AMD-APP (2348.3)] [SvmCaps 0 1] csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms OK: capturing 640x480 image(s) into 640x480 RGB image buffer ERROR: clEnqueueNDRangeKernel(testsupernode,2,*,80x480,...) failed(-63) for group#1 ERROR: vxScheduleGraph() failed (-1:VX_FAILURE) OK: OpenCL buffer usage: 3384720, 2/5

maybe it has something todo with the fedora linux host?

gshisha commented 7 years ago

Can you post your canny.gdf here. The error you see is invalid global work size. We will try to repro the issue.

arpu commented 7 years ago

its from the core samples File canny.gdf: https://github.com/GPUOpen-ProfessionalCompute-Libraries/amdovx-core/tree/master/runvx http://lpaste.net/354545

arpu commented 7 years ago

for intel build in GPU XCAM DEBUG cl_device.cpp:132: cl get device info, max_compute_unit:24 max_work_item_dims:3 max_work_item_sizes:{512, 512, 512} max_work_group_size:512 XCAM DEBUG cl_device.cpp:54: CL device constructed

and for nvidia i get XCAM DEBUG cl_device.cpp:132: cl get device info, max_compute_unit:8 max_work_item_dims:3 max_work_item_sizes:{1024, 1024, 64} max_work_group_size:1024 XCAM DEBUG cl_device.cpp:54: CL device constructed

maybe this helps?

arpu commented 7 years ago

could you repro this issue?

gshisha commented 7 years ago

I tried on ubuntu last week, seems to be working fine there. Will continue on it next week... as I'm travelling

omaralvarez commented 7 years ago

I have this same issue running on Debian 8. NVIDIA OpenCL 1.2, any updates? I also have issues with clFillBuffer (allocateBuffer in amdovx-core), it segfaults.

sheldonrobinson commented 7 years ago

I believe the library supports OpenCL only on AMD GPUs. It's not yet hardware agnostic

arpu commented 7 years ago

why do you think this ? @lcskrishna is done some awesome opencl for intel

arpu commented 7 years ago

@gshisha any news on this?

arpu commented 7 years ago

i will close this bug here because it looks like a amdovx-core problem