Closed CNugteren closed 7 years ago
I just checkout the source code. It seems that "incx, incy' was indeed taken into account (refer by X[i * incx] everywhere) by the source code.
Hmmm, Maybe something went wrong in the calling stack.
On Tue, Mar 8, 2016 at 2:01 PM, Cedric Nugteren notifications@github.com wrote:
I am having trouble understanding the results of the her2/hpr2/syr2/spr2 BLAS level-2 family of results when using increments other than 1 for the X or Y vector inputs. For example, I tried running the example example_ssyr2.c with vector increments.
As reference, the default case (as it is included in the repository) outputs the following:
A matrix: ( 1.0) ( 2.0) ( 3.0) ( 4.0) ( 5.0) ( 0.0) ( 6.0) ( 7.0) ( 8.0) ( 9.0) ( 0.0) ( 0.0) ( 10.0) ( 11.0) ( 12.0) ( 0.0) ( 0.0) ( 0.0) ( 13.0) ( 14.0) ( 0.0) ( 0.0) ( 0.0) ( 0.0) ( 15.0)
Result: ( 101.0) ( 142.0) ( 183.0) ( 224.0) ( 265.0) ( 0.0) ( 166.0) ( 187.0) ( 208.0) ( 229.0) ( 0.0) ( 0.0) ( 190.0) ( 191.0) ( 192.0) ( 0.0) ( 0.0) ( 0.0) ( 173.0) ( 154.0) ( 0.0) ( 0.0) ( 0.0) ( 0.0) ( 115.0)
Then, I replace the following:
static const cl_float X[] = { 1.0, 2.0, 3.0, 4.0, 5.0 }; static const int incx = 1;
with:
static const cl_float X[] = { 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0, 5.0, 5.0 }; static const int incx = 2;
After also changing the buffer size (multiply by incx) and the clEnqueueWriteBuffer (also multiply by incx), I expect the same results. Instead, I obtain:
Result: ( 101.0) ( 132.0) ( 163.0) ( 204.0) ( 255.0) ( 0.0) ( 126.0) ( 117.0) ( 128.0) ( 159.0) ( 0.0) ( 0.0) ( 70.0) ( 51.0) ( 62.0) ( 0.0) ( 0.0) ( 0.0) ( 13.0) ( 14.0) ( 0.0) ( 0.0) ( 0.0) ( 0.0) ( 15.0)
I was expecting exactly the same output, but this is not the case. Perhaps I misunderstand the BLAS specs, or perhaps there is a bug in clBLAS? Is this case tested at all? The errors seem to manifest only in blocks of 32 by 32, so for larger matrices the error is less (or not at all) present.
The same problem occurs when changing incy instead of incx.
This is with clBLAS 2.10 on an Intel CPU on Linux, but I see similar behaviour on OS X with an AMD GPU.
— Reply to this email directly or view it on GitHub https://github.com/clMathLibraries/clBLAS/issues/237.
Tingxing dong
I tested your modifications of the test program above, and now it works for that particular case (incx=2)
. It should have been fixed by PR #289.
That's a long time ago! Thanks for fixing.
I am having trouble understanding the results of the her2/hpr2/syr2/spr2 BLAS level-2 family of results when using increments other than 1 for the X or Y vector inputs. For example, I tried running the example
example_ssyr2.c
with vector increments.As reference, the default case (as it is included in the repository) outputs the following:
Then, I replace the following:
with:
After also changing the buffer size (multiply by incx) and the
clEnqueueWriteBuffer
(also multiply by incx), I expect the same results. Instead, I obtain:I was expecting exactly the same output, but this is not the case. Perhaps I misunderstand the BLAS specs, or perhaps there is a bug in clBLAS? Is this case tested at all? The errors seem to manifest only in blocks of 32 by 32, so for larger matrices the error is less (or not at all) present.
The same problem occurs when changing
incy
instead ofincx
.This is with clBLAS 2.10 on an Intel CPU on Linux, but I see similar behaviour on OS X with an AMD GPU.