Closed CNugteren closed 9 years ago
Hi Cedric,
I am having trouble understanding the scenario as well. The comment from line 667 to 669 confuses me the most. In the example there are two matrix being allocated but I could not figure out what is K.
Anyway, if I guessed correctly, this portion of the code is trying to discount the elements between lda and M in the last column if column major is used or between lda and N in the last row if row major is used. I agree with your fix that if ( offA + matrSize ) > unusedTail) is false memUsed should be offA + matrSize instead of 0. But I also think if ( offA + matrSize ) > unusedTail) is true memUsed should be offA + matrSize as well since matrSize already does not count the unusedTail. (((N - 1) * lda + M) instead of N*lda )
so I think those two lines can become memUsed = offA + matrSize;
any thoughts?
I agree with your guess. It seems like the tail is already taken into account with matrSize
, so yes, I don't really see why you should subtract it. I was hoping actually that you would knew what this code meant :-)
There are some other parts I don't understand:
( offA + matrSize ) > unusedTail
really mean?offA + matrSize < offA
. This only makes sense if matrSize
(size_t
) can become negative, right? Why would that be the case?I made some changes to remove the if-statement altogether (see new commit).
Most people will allocate the whole memory without taking unused tail out. So I think this check is already a overkill. I would merge this pull request. We can reopen this discussion later if needed.
When computing the requirement of an OpenCL buffer in the case of a matrix, there is a check for a corner case, described in
blas/generic/common.c
as:I don't doubt this corner-case, however, I am having trouble with the regular scenario, as the memory used is then set to zero:
I applied a 'fix' in two cases, one for regular matrices and one for banded matrices. Note that this computation (as far as I see it) affects whether or not an error-code is returned.