ROCm / MIVisionX

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
https://rocm.docs.amd.com/projects/MIVisionX/en/latest/
MIT License
186 stars 74 forks source link

LoomSL - shader build on starts fails on gainmatrix #396

Open arpu opened 4 years ago

arpu commented 4 years ago

Hey

with config

      setGlobalAttribute(0,1);// # 0 -- Profiler::0:OFF 1:ON Default:OFF

        setGlobalAttribute(7,0); //#simple/quality stitch
        // Turn Off/ON ExpoComp
        setGlobalAttribute(1,1);  //# 1 -- ExpoComp::0:OFF 1:ON Default:ON

        // Turn Off/ON SeamFind
        setGlobalAttribute(2,0); //# 2 -- SeamFind::0:OFF 1:ON Default:ON

        // Turn Off/ON Multiband & Num Bands
        setGlobalAttribute(5,0); //# 5 -- Multiband::0:OFF 1:ON Default:ON

        setGlobalAttribute(6,4); //# 6 -- Multiband Bands
   //   setGlobalAttribute(56,1); //# 6 -- Multiband Bands

shader build fails on start

LOG:[status=-1] ERROR: clBuildProgram(0x7fd1ed536c30,-cl-std=CL1.2) failed(-11) for com.amd.loomsl.expcomp_compute_gainmatrix ERROR: OpenVX call failed with status = (-1) at /home/arpu/Work/githubsources/MIVisionX/amd_openvx_extensions/amd_loomsl/live_stitch_api.cpp#2758

rocminfo:

ROCk module is loaded
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3900                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            4                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16307136(0xf8d3c0) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16307136(0xf8d3c0) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx803                             
  Uuid:                    GPU-XX                             
  Marketing Name:          Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26607(0x67ef)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1210                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            14                                 
  SIMDs per CU:            4                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***  
kiritigowda commented 4 years ago

@arpu can you attach the loom shell script and test case images for this bug?

arpu commented 4 years ago

data from sample-1

loom_shell 0.9.9 [loomsl 0.9.9]
... processing commands from loomStitch-sample1.txt
..ls_context context[1] created
..lsCreateContext: created context context[0]
..lsSetOutputConfig: successful for context[0]
..lsSetCameraConfig: successful for context[0]
OK: OpenVX using GPU device#0 (gfx803) [OpenCL 1.2 ] [SvmCaps 0 1]
ERROR: clBuildProgram(0xdc5c00,-cl-std=CL1.2) failed(-11) for com.amd.loomsl.expcomp_compute_gainmatrix
ERROR: OpenVX call failed with status = (-1) at /home/arpu/Work/githubsources/MIVisionX/amd_openvx_extensions/amd_loomsl/live_stitch_api.cpp#2758
ERROR: lsInitialize(context[0]) failed (-1) @loomStitch-sample1.txt#24
... exit from loomStitch-sample1.txt

loomStitch-sample1.txt

arpu commented 4 years ago

anything i can provide?

kiritigowda commented 4 years ago

@arpu can you check why you are using OpenCL 1.2

OK: OpenVX using GPU device#0 (gfx803) [OpenCL 1.2 ] [SvmCaps 0 1]

ROCm installs OpenCL 2.0+

arpu commented 4 years ago

tested with opencl 2 same

arpu@hokuspokus  ~/Work/githubsources/vreenproducer_test/sample  /usr/local/bin/loom_shell loomStitch-sample1.txt                                                                                                                                                                                    ✔  10014  03:02:06
loom_shell 0.9.9 [loomsl 0.9.9]
... processing commands from loomStitch-sample1.txt
..ls_context context[1] created
..lsCreateContext: created context context[0]
..lsSetOutputConfig: successful for context[0]
..lsSetCameraConfig: successful for context[0]
OK: CL_VERSION_2_0 OpenVX using GPU device#0 (gfx803) [OpenCL C 2.0 ] [SvmCaps 0 0]
ERROR: clBuildProgram(0xfc1cd0,-cl-std=CL2.0) failed(-11) for com.amd.loomsl.expcomp_compute_gainmatrix
ERROR: OpenVX call failed with status = (-1) at /home/arpu/Work/githubsources/MIVisionX/amd_openvx_extensions/amd_loomsl/live_stitch_api.cpp#2758
ERROR: lsInitialize(context[0]) failed (-1) @loomStitch-sample1.txt#24
... exit from loomStitch-sample1.txt
 arpu@hokuspokus  ~/Work/githubsources/vreenproducer_test/sample  ldd /usr/local/bin/loom_shell loomStitch-sample1.txt                                                                                                                                                                       SIG(127) ↵  10015  03:02:18
/usr/local/bin/loom_shell:
    linux-vdso.so.1 (0x00007ffcea98f000)
    libvx_loomsl.so => /usr/local/lib/libvx_loomsl.so (0x00007f373e655000)
    libopenvx.so => /usr/local/lib/libopenvx.so (0x00007f373e433000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f373e3f4000)
    libOpenCL.so.1 => /opt/rocm-3.9.1/opencl/lib/libOpenCL.so.1 (0x00007f373e1ec000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f373e004000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f373debe000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f373dea1000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f373dcd6000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f373dccf000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f373e733000)
loomStitch-sample1.txt:
kiritigowda commented 3 years ago

@paveltc can you check if this issue is still present on TOT ?

arpu commented 3 years ago

tested with latest master version still the same error

 loom_shell test.txt
loom_shell 0.9.9 [loomsl 0.9.9]
... processing commands from test.txt
..ls_context context[1] created
..lsCreateContext: created context context[0]
..lsSetOutputConfig: successful for context[0]
..lsSetCameraConfig: successful for context[0]
OK: OpenVX using GPU device#0 (gfx803) [OpenCL 1.2 ] [SvmCaps 0 1]
ERROR: clBuildProgram(0x7fc65fcb9e80,-cl-std=CL1.2) failed(-11) for com.amd.loomsl.expcomp_compute_gainmatrix
ERROR: OpenVX call failed with status = (-1) at /home/arpu/Work/githubsources/MIVisionX_2021/amd_openvx_extensions/amd_loomsl/live_stitch_api.cpp#2758
ERROR: lsInitialize(context[0]) failed (-1) @test.txt#24
... exit from test.txt
paveltc commented 3 years ago

@kiritigowda I see the same error. You need to set the ExpoComp variable to ON in the loomShell script to see this problem. With ExpoComp off, this doesn't happen.

kiritigowda commented 3 years ago

@rrawther can you take a look at this?

ppanchad-amd commented 4 months ago

@arpu Apologies for the delayed response. Can you please test with the latest ROCm 6.1.2? If issue is resolved, please close the ticket. Thanks!