HPCE / hpce-2017-cw5

1 stars 6 forks source link

tbb does not execute all of its tasks in edit_distance #22

Closed natoucs closed 6 years ago

natoucs commented 6 years ago

I run this optimised code for edit_distance:

` tbb::parallel_for( start, end, [&](int j) {

     i =  end2 - (j - start);  //end2 defined outside tbb

     if( s[i-1] == t[j-1] ){ 
        d(i,j) = d(i-1,j-1); 
      }else{
        d(i,j) = 1 + std::min(std::min( d(i-1,j), d(i,j-1) ), d(i-1,j-1) ); 
      }       

} );

`

With a scale from 1 to 20, the reference and optimised always match. From 20 onwards, the outputs are not correct sometimes.

When I look into why, it seems some random indexes in the middle of the table have the value '0' (whereas they are surrounded by positive numbers only). An example for scale 20: screenshot from 2017-11-15 12-16-31

This could happen if tbb had not executed this task because I did not lay out my iteration space properly.

However when I look into the list of indexes tbb went through, the '0' value indexes are there, which means tbb run the task for that value of (i,j) but did not assign d(i,j) a value with the if/else.

How is that possible ? Why would a tbb task terminate before doing all it is supposed to do ? Should I use a more primitive form of tbb to have more control ?

crypto-perry commented 6 years ago

Well the chunks are not independent and you have no synchronization on shared resources. You can imagine when the second chunk starts all the values in the previous row are 0 as they were not yet filled by the computation of the first chunk. Hence the 0's you see.

natoucs commented 6 years ago

I think you misunderstood the problem. You do not take into account my outputs are equal most of the time. The problem arises sometimes when I scale up.

  1. the chunks are independent as the top left 3 squares needed are already computed (diagonal iterations)
  2. what do you mean by 'synchronization on shared ressources' ?
  3. there is only one 0 that appears in the table and it propagates to the output, making it incorrect. The rest of the table is the same as the reference table.

The only problem I am trying to solve is why tbb does not execute fully this 1 task out of 1000. This 1 error causes the answer to be wrong while the 999 other tasks work properly with no race condition and similar values to the reference.

The possibility I see right now is: two tasks are trying to reach out to the value of d(i-1,j) or d(i,j-1) at the same time, eventually causing one to not be able to access the value. Would that be possible ? Probably that is what you are referring to for point '2'.

natoucs commented 6 years ago

Ok got it !

The only thing I needed to change was to add int in front of i. (int i =... was declared outside of tbb).

Indeed, one needs to redeclare inside tbb a variable that is read-shared so that each parallel task is able to use it without creating a race condition.