Possible issue with incx/incy for her2/hpr2/syr2/spr2 routines

CNugteren commented 8 years ago

I am having trouble understanding the results of the her2/hpr2/syr2/spr2 BLAS level-2 family of results when using increments other than 1 for the X or Y vector inputs. For example, I tried running the example example_ssyr2.c with vector increments.

As reference, the default case (as it is included in the repository) outputs the following:

A matrix:
 (   1.0) (   2.0) (   3.0) (   4.0) (   5.0)
 (   0.0) (   6.0) (   7.0) (   8.0) (   9.0)
 (   0.0) (   0.0) (  10.0) (  11.0) (  12.0)
 (   0.0) (   0.0) (   0.0) (  13.0) (  14.0)
 (   0.0) (   0.0) (   0.0) (   0.0) (  15.0)

Result:
 ( 101.0) ( 142.0) ( 183.0) ( 224.0) ( 265.0)
 (   0.0) ( 166.0) ( 187.0) ( 208.0) ( 229.0)
 (   0.0) (   0.0) ( 190.0) ( 191.0) ( 192.0)
 (   0.0) (   0.0) (   0.0) ( 173.0) ( 154.0)
 (   0.0) (   0.0) (   0.0) (   0.0) ( 115.0)

Then, I replace the following:

static const cl_float X[] = {
    1.0,
    2.0,
    3.0,
    4.0,
    5.0
};
static const int incx = 1;

with:

static const cl_float X[] = {
    1.0,
    1.0,
    2.0,
    2.0,
    3.0,
    3.0,
    4.0,
    4.0,
    5.0,
    5.0
};
static const int incx = 2;

After also changing the buffer size (multiply by incx) and the clEnqueueWriteBuffer (also multiply by incx), I expect the same results. Instead, I obtain:

Result:
 ( 101.0) ( 132.0) ( 163.0) ( 204.0) ( 255.0)
 (   0.0) ( 126.0) ( 117.0) ( 128.0) ( 159.0)
 (   0.0) (   0.0) (  70.0) (  51.0) (  62.0)
 (   0.0) (   0.0) (   0.0) (  13.0) (  14.0)
 (   0.0) (   0.0) (   0.0) (   0.0) (  15.0)

I was expecting exactly the same output, but this is not the case. Perhaps I misunderstand the BLAS specs, or perhaps there is a bug in clBLAS? Is this case tested at all? The errors seem to manifest only in blocks of 32 by 32, so for larger matrices the error is less (or not at all) present.

The same problem occurs when changing incy instead of incx.

This is with clBLAS 2.10 on an Intel CPU on Linux, but I see similar behaviour on OS X with an AMD GPU.

tingxingdong commented 8 years ago

I just checkout the source code. It seems that "incx, incy' was indeed taken into account (refer by X[i * incx] everywhere) by the source code.

Hmmm, Maybe something went wrong in the calling stack.

On Tue, Mar 8, 2016 at 2:01 PM, Cedric Nugteren notifications@github.com wrote:

I am having trouble understanding the results of the her2/hpr2/syr2/spr2 BLAS level-2 family of results when using increments other than 1 for the X or Y vector inputs. For example, I tried running the example example_ssyr2.c with vector increments.

As reference, the default case (as it is included in the repository) outputs the following:

A matrix: ( 1.0) ( 2.0) ( 3.0) ( 4.0) ( 5.0) ( 0.0) ( 6.0) ( 7.0) ( 8.0) ( 9.0) ( 0.0) ( 0.0) ( 10.0) ( 11.0) ( 12.0) ( 0.0) ( 0.0) ( 0.0) ( 13.0) ( 14.0) ( 0.0) ( 0.0) ( 0.0) ( 0.0) ( 15.0)

Result: ( 101.0) ( 142.0) ( 183.0) ( 224.0) ( 265.0) ( 0.0) ( 166.0) ( 187.0) ( 208.0) ( 229.0) ( 0.0) ( 0.0) ( 190.0) ( 191.0) ( 192.0) ( 0.0) ( 0.0) ( 0.0) ( 173.0) ( 154.0) ( 0.0) ( 0.0) ( 0.0) ( 0.0) ( 115.0)

Then, I replace the following:

static const cl_float X[] = { 1.0, 2.0, 3.0, 4.0, 5.0 }; static const int incx = 1;

with:

static const cl_float X[] = { 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0, 5.0, 5.0 }; static const int incx = 2;

After also changing the buffer size (multiply by incx) and the clEnqueueWriteBuffer (also multiply by incx), I expect the same results. Instead, I obtain:

Result: ( 101.0) ( 132.0) ( 163.0) ( 204.0) ( 255.0) ( 0.0) ( 126.0) ( 117.0) ( 128.0) ( 159.0) ( 0.0) ( 0.0) ( 70.0) ( 51.0) ( 62.0) ( 0.0) ( 0.0) ( 0.0) ( 13.0) ( 14.0) ( 0.0) ( 0.0) ( 0.0) ( 0.0) ( 15.0)

I was expecting exactly the same output, but this is not the case. Perhaps I misunderstand the BLAS specs, or perhaps there is a bug in clBLAS? Is this case tested at all? The errors seem to manifest only in blocks of 32 by 32, so for larger matrices the error is less (or not at all) present.

The same problem occurs when changing incy instead of incx.

This is with clBLAS 2.10 on an Intel CPU on Linux, but I see similar behaviour on OS X with an AMD GPU.

— Reply to this email directly or view it on GitHub https://github.com/clMathLibraries/clBLAS/issues/237.

Tingxing dong

MigMuc commented 7 years ago

I tested your modifications of the test program above, and now it works for that particular case (incx=2). It should have been fixed by PR #289.

CNugteren commented 7 years ago

That's a long time ago! Thanks for fixing.

clMathLibraries / clBLAS

Possible issue with incx/incy for her2/hpr2/syr2/spr2 routines #237