doe300 / VC4CL

OpenCL implementation running on the VideoCore IV GPU of the Raspberry Pi models
MIT License
728 stars 80 forks source link

Wrong result? #27

Closed nomaddo closed 6 years ago

nomaddo commented 6 years ago

I am trying to check VC4CL works correctly. In the following code, the output is differenct from what I expected and the one of NVIDIA env.

kernel

 kernel void hello(global float * x){                                                                                                                                                     
   int ind = get_global_id(0);                                                                                                                                                            
   x[ind] = x[ind] * 2;                                                                                                                                                                   
 }                                                                                                                                                                                        

host code can found in https://github.com/nomaddo/opencl-benchmark/blob/master/gpu.c . This host code just take kernel file name, function name, num of args, and the length of each argument.. Then execute the kernel, and print all arguments.

 pi@nomaddo-pi3:~/opencl-benchmark$ sudo ./gpu mul.cl hello 1 10                                                                                                                          
 0.000000                                                                                                                                                                                 
 1.000000                                                                                                                                                                                 
 2.000000                                                                                                                                                                                 
 3.000000                                                                                                                                                                                 
 4.000000                                                                                                                                                                                 
 10.000000                                                                                                                                                                                
 6.000000                                                                                                                                                                                 
 7.000000                                                                                                                                                                                 
 8.000000                                                                                                                                                                                 
 9.000000                                                                                                                                                                                 
 Runtime: 0.000419ms   

The expected result (and the one of NVIDIA GPU) is as follows:

nomaddo@nomaddo-AS:~/opencl-benchmark$ ./gpu mul.cl hello 1 10
0.000000
2.000000
4.000000
6.000000
8.000000
10.000000
12.000000
14.000000
16.000000
18.000000
Runtime: 0.000070ms

I use the latest of VC4C built by circleci, and self-compiled VC4CL, which also come from latest source-code.

harleyzhang commented 6 years ago

You may check this

https://github.com/doe300/VC4C/issues/30

There are issues in the implementation. I tried to help, however my knowledge about VideoCore programming is rather limited.

doe300 commented 6 years ago

Can you also re-check with the latest VC4CL?

nomaddo commented 6 years ago

Thanks. The output became correct.