if( row + float4size < numrows ) {
// use vector maths
} else {
for( i = row; i < numrows; i++ ) {
// use scalar maths
}
}
Per my understanding:
this will execute both branches of the 'if' systematically?
And in the second branch, for some threads, it will loop over the entire matrix, every row in the matrix?
and therefore, since all threads in the warp must be in lock-step, actually every thread in the entire warp will iterate over every row in the entire matrix?
Hi,
In sger kernel, we have:
Per my understanding: