giaf / blasfeo

Basic linear algebra subroutines for embedded optimization
Other
323 stars 88 forks source link

wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

Closed FreyJo closed 6 years ago

FreyJo commented 6 years ago

Hey guys,

I found a bug in blasfeo_dgemv_t: I am using BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE. Running the following example, I get wrong results using both HIGH PERFORMANCE and REFERENCE:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
    int nx = 9;
    int nu = 2;

    struct blasfeo_dmat A;
    struct blasfeo_dvec lambda;
    struct blasfeo_dvec lambda0;

    blasfeo_allocate_dmat(nx, nx, &A); //    
    blasfeo_allocate_dvec(nx+nu, &lambda);
    blasfeo_allocate_dvec(nx+nu, &lambda0);

    for (int ii = 0; ii < nx+nu; ii++) {
        blasfeo_dvecin1((double) ii, &lambda, ii);
    }
    blasfeo_dgese(nx, nx, 1.0, &A, 0, 0);

    blasfeo_print_dmat(nx, nx, &A, 0, 0);
    printf("lambda = \n");
    blasfeo_print_dvec(nx+nu, &lambda, 0);    

    blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 0.0, &lambda0, 0, &lambda, 0); // recheck!
    printf("lambda: result = \n");
    blasfeo_print_exp_dvec(nx+  nu, &lambda,0);

    return 0;
}

REFERENCE prints:

lambda: result =
3.600000e+01
3.600000e+01
1.070000e+02
1.070000e+02
3.160000e+02
3.160000e+02
9.390000e+02
9.390000e+02
2.804000e+03
9.000000e+00
1.000000e+01

whereas HP prints:

lambda: result =
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
2.960000e+02
9.000000e+00
1.000000e+01

The correct result should be:

lambda =

    36
    36
    36
    36
    36
    36
    36
    36
    36
     9
    10

I hope this helps fixing stuff!

tmmsartor commented 6 years ago

Thanks for reporting, I guess the problem is that you passed the same chunk of memory both for result z and for the multiplication vector x, this is forbidden also in standard BLAS implementation. The only difference is that REF_BLAS overwrites addition vector y with the result.

y := alpha A x + beta y // BLAS z := alpha A x + beta y // BLASFEO

Despite the fact y and z may be different in BLASFEO z and x must be different in both cases, i.e. in Netlib_BLAS implementation this is directly enforced by the language. In Fortran having aliased arguments breaks the language standard. Fun fact: this is also one of the reasons someone believe that Fortran is faster than C.

With the following modifications to your code I'm getting the expected both with HP and REF, if you need indeed to overwrite vector x, in case, you have to make a copy of the vector.

    for (int ii = 0; ii < nx+nu; ii++) {
        blasfeo_dvecin1((double) ii, &lambda, ii);
        blasfeo_dvecin1((double) 1, &lambda0, ii);
    }
   ...
    blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 1.0, &lambda0, 0, &lambda0, 0);
    printf("lambda0: result = \n");
    blasfeo_print_exp_dvec(nx+nu, &lambda0, 0);
    printf("lambda: result = \n");
    blasfeo_print_exp_dvec(nx+nu, &lambda, 0);

lambda0: result = 3.700000e+01 3.700000e+01 3.700000e+01 3.700000e+01 3.700000e+01 3.700000e+01 3.700000e+01 3.700000e+01 3.700000e+01 1.000000e+00 1.000000e+00

lambda: result = 0.000000e+00 1.000000e+00 2.000000e+00 3.000000e+00 4.000000e+00 5.000000e+00 6.000000e+00 7.000000e+00 8.000000e+00 9.000000e+00 1.000000e+01

Thanks again for reporting we will add more documentation about this kind of unexpected behaviors. Maybe it would also be helpful to add some check inside the routines to avoid those. ping @giaf

FreyJo commented 6 years ago

Oops, I should have checked this before.. But thanks a lot for your reply!

giaf commented 6 years ago

Yep it's true, x and z can not be the same. The reason is that, the elements of x will change as soon as the elements of z are computed. x is not stored internally, so if x and z are the same vector, you see that this can not work.

The same applies for most other routines (as e.g. dgemm), with the only exception of triangular matrices. In that case, it is possible to multiply or solver by a triangular matrix "in place" by performing the computations in the proper order. This comes from the triangular shape of the matrix: on the edge side, only one element of x is involved, and once the operation is performed, it can be overwritten with the result z. The standard BLAS interface acknowledge this by requiring to overwrite the result for triangular matrices; the BLASFEO interface allows for this, but does not force it.