clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
843 stars 237 forks source link

Updated the computation of matrix memory requirements #127

Closed CNugteren closed 9 years ago

CNugteren commented 9 years ago

When computing the requirement of an OpenCL buffer in the case of a matrix, there is a check for a corner case, described in blas/generic/common.c as:

It is possible to allocate a buffer, and set up lda & ldb such that it looks like it will access outside of the allocated buffer, but if M & N are kept small enough, no out of bounds access will occur. Compensate for the offset values and the unused tail memory caused by lda & ldb.

I don't doubt this corner-case, however, I am having trouble with the regular scenario, as the memory used is then set to zero:

if (( offA + matrSize ) > unusedTail) {
  memUsed = offA + matrSize - unusedTail;
} else {
  memUsed = 0; // Shouldn't this be "offA + matrSize"?
}

I applied a 'fix' in two cases, one for regular matrices and one for banded matrices. Note that this computation (as far as I see it) affects whether or not an error-code is returned.

TimmyLiu commented 9 years ago

Hi Cedric,

I am having trouble understanding the scenario as well. The comment from line 667 to 669 confuses me the most. In the example there are two matrix being allocated but I could not figure out what is K.

Anyway, if I guessed correctly, this portion of the code is trying to discount the elements between lda and M in the last column if column major is used or between lda and N in the last row if row major is used. I agree with your fix that if ( offA + matrSize ) > unusedTail) is false memUsed should be offA + matrSize instead of 0. But I also think if ( offA + matrSize ) > unusedTail) is true memUsed should be offA + matrSize as well since matrSize already does not count the unusedTail. (((N - 1) * lda + M) instead of N*lda )

so I think those two lines can become memUsed = offA + matrSize;

any thoughts?

CNugteren commented 9 years ago

I agree with your guess. It seems like the tail is already taken into account with matrSize, so yes, I don't really see why you should subtract it. I was hoping actually that you would knew what this code meant :-)

There are some other parts I don't understand:

I made some changes to remove the if-statement altogether (see new commit).

TimmyLiu commented 9 years ago

Most people will allocate the whole memory without taking unused tail out. So I think this check is already a overkill. I would merge this pull request. We can reopen this discussion later if needed.