dlewissandy / lambda-blas

Native Haskell implementation of the BLAS library
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

As a developer of scientific software, I need a native Haskell implementation of the sscal function so that I can compute the vector/scalar product in a pure, type-safe, thread-safe manner. #42

Closed dlewissandy closed 7 years ago

dlewissandy commented 7 years ago

Documentation for the sscal function can be found at BLAS.

dlewissandy commented 7 years ago

This function has the following mathematical specification:

forall n>0, incx>0, 0<= i <= (n-1)*inc
(sscal n sa sx incx)!i = 
   if (i `mod` incx) == 0 then sa * u!i  else u!i

When incx = 1, this corresponds to the vectorspace operation of scalar multiplication.

dlewissandy commented 7 years ago

BLAS Implementation notes:

dlewissandy commented 7 years ago

Design Choices:

dlewissandy commented 7 years ago

I have created a branch (ISSUE-42) to begin the implementation of this function.

dlewissandy commented 7 years ago

I have collected the time performance data for the native implementation. I am attaching the raw data to this post sscal.csv.zip

dlewissandy commented 7 years ago

I anticipate that for inc=1 that the time performance of the stream based implementation will be proportional to the vector length, or O(inc*(n-1)). For the unsafe implementation, I expect a time complexity proportional to the number of scaled entries, O(n).

Summary statistics for the fit of the native implementation to the model time(n,z) = an + bz + c where z = (n-1)*inc follow:


Residuals:
    Min      1Q  Median      3Q     Max
-5.4225 -0.0249 -0.0064  0.1040  5.9975

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.014e-02  1.460e-01   0.343    0.732
N           8.098e-04  3.811e-05  21.246   <2e-16 
Z           5.960e-04  2.410e-05  24.733   <2e-16 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.21 on 87 degrees of freedom
Multiple R-squared:  0.9672,    Adjusted R-squared:  0.9664
F-statistic:  1282 on 2 and 87 DF,  p-value: < 2.2e-16
dlewissandy commented 7 years ago

Summary statistics for the unsafe implementation to the model time(n,z)=an+bz+c where z=(n-1)*inc follow:

lm(formula = UNSAFE ~ N + Z, data = sscal)

Residuals:
     Min       1Q   Median       3Q      Max
-2.25448 -0.03327  0.04609  0.05948  2.55762

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.014e-02  6.230e-02   0.163    0.871
N           1.175e-04  1.627e-05   7.223 1.82e-10 ***
Z           3.178e-04  1.029e-05  30.893  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5167 on 87 degrees of freedom
Multiple R-squared:  0.957, Adjusted R-squared:  0.956
F-statistic:   968 on 2 and 87 DF,  p-value: < 2.2e-16
dlewissandy commented 7 years ago

In both cases, the P value for the coefficients of N and Z are small, indicating that we can reject the null hypothesis that a and b are zero. The fact that the coefficient for Z is non-zero for the unsafe model is somewhat suprising, and may be an indication that the cost of marshaling a mutable copy of the vector has not been separated from the cost of running the algorithm.

From this data, we can expect that the stream based implementation will have an asymptotic relative execution time given by the following expression:

8.1E-4  + 6E-4 *INC
---------------------- 
1.2E-4 + 3.2E-4 * INC

This corresponds to about 6.7x the unsafe call when INC is equal to 1, and about 1.875x the unsafe call in the case when both N and INC are very large. These numbers are well within the 100x goal for this project milestone.

dlewissandy commented 7 years ago
dlewissandy commented 7 years ago

The foreign calls to sscal currently use the thaw function to create a mutable vector for use in the benchmarking. I have temporarily replaced this call with unsafeThaw to test if it eliminates the unexpected dependence of the time performance on Z. The regression results for the case when INC=1 are as follows:

lm(formula = UNSAFE_NOCOPY ~ 0 + N, data = unrolled)

Residuals:
      Min        1Q    Median        3Q       Max
-0.274000  0.005778  0.035665  0.038943  0.155501

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 1.348e-04  2.331e-06   57.85   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.09576 on 38 degrees of freedom
Multiple R-squared:  0.9888,    Adjusted R-squared:  0.9885
F-statistic:  3347 on 1 and 38 DF,  p-value: < 2.2e-16

and When INC>1

all:
lm(formula = UNSAFE_NOCOPY ~ 0 + N, data = rolled)

Residuals:
     Min       1Q   Median       3Q      Max
-0.53339 -0.01117  0.02548  0.03646  0.32093

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 1.279e-03  3.615e-05   35.38   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1493 on 50 degrees of freedom
Multiple R-squared:  0.9616,    Adjusted R-squared:  0.9608
F-statistic:  1252 on 1 and 50 DF,  p-value: < 2.2e-16

The unexplained Z dependence has been eliminated :)

dlewissandy commented 7 years ago

Issue completed with the approval of pull request #56 and merged into master.