How code can be synchronize ...

RoshanGerard / aparapi

Automatically exported from code.google.com/p/aparapi

Other

0 stars 0 forks source link

How code can be synchronize ... #66

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1.i am doing  vector vector multiplication, after multiplication how i can 
update global veritable means synchronize ... i know aparapi does not support 
synchronize block or method .. is this  any way ..
2.
public void run(){
    int globalId = getGlobalId();          
   int sum1 = vecA[globalId] *  vecB[globalId];
   sum [0] =sum[0]+ sum1;
  }

3.

What is the expected output? What do you see instead?
single value of sum 

What version of the product are you using? On what operating system?
Platform Version: OpenCL 1.2 AMD-APP (938.1)
Board name:AMD FireStream 9350
aparapi_2011_09_13
OS - CentOS 5.4

Please provide any additional information below.
with out synchronization it given garbage value .

Original issue reported on code.google.com by kri22go...@gmail.com on 31 Aug 2012 at 12:00

GoogleCodeExporter commented 9 years ago

So this is not really a bug report or an issue ;) this question probably should 
be in the discussion list. However ;) lets try to help.

As you discovered there is a race-condition in this code. Every
'thread' is trying to update a s single value at sum[0]

For this particular problem you can use Kernel.atomicAdd(int[] _arr,
int _index, int _delta)

So you could try :-

atomicAdd(sum, 0, sum1);

Let me know if it works and I will close this.

Gary

Original comment by frost.g...@gmail.com on 31 Aug 2012 at 12:19

GoogleCodeExporter commented 9 years ago

You could also simply create an array of results on the GPU and sum them on the 
CPU side. You should perform a performance comparison between the two 
approaches to see which one works best for your needs.

Original comment by ryan.lam...@gmail.com on 1 Sep 2012 at 1:38

GoogleCodeExporter commented 9 years ago

thnx gary..
it is working fine ... 
if i used float array instance of int no atomicAdd(...)  or similar function is 
available.. for float any clue.. 
next time i will put such kind of question in the discussion list.

thnx ryan .. i will try ur approach also ...

Original comment by kri22go...@gmail.com on 3 Sep 2012 at 9:53

GoogleCodeExporter commented 9 years ago

Yes, it was just a suggestion, you could also try increasing your range and 
perform the reduction on the GPU side.

For example:

if(id < value) {
  // multiply and store intermediate results
} else {
  // sum intermediate results
}

Original comment by ryan.lam...@gmail.com on 3 Sep 2012 at 11:21

GoogleCodeExporter commented 9 years ago

Original comment by ryan.lam...@gmail.com on 21 Apr 2013 at 9:47

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

I have a query regarding #2 comment. How can we create an array of results on 
the GPU (inside kernel method ?) and use it on the CPU? Can you show me in 
terms of code?

Original comment by shrutira...@gmail.com on 22 Apr 2013 at 6:06

GoogleCodeExporter commented 9 years ago

Hello shrutiranade38,

The idea is that you pass an array to your Kernel before execution, populate it 
during execution and then retrieve the array once the kernel execution has 
completed.

Original comment by ryan.lam...@gmail.com on 22 Apr 2013 at 7:51