dlewissandy / lambda-blas

Native Haskell implementation of the BLAS library
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

As a developer of scientific software, I need a native Haskell implementation of the scopy function so that I can copy vectors in a pure, type-safe, thread-safe manner. #34

Closed dlewissandy closed 7 years ago

dlewissandy commented 7 years ago

Documentation for the scopy function can be found at BLAS.

dlewissandy commented 7 years ago

This function has the mathematical specification:

forall n>0, 0<i<length sy
(sscal n sx incx sy incy)!i = 
   if (i `mod` incy) == 0 && i `div` incy<n 
       then if ( sign incx == sign incy ) 
                    then u!(i*incx)  
                    else u!((1-n+i )*incx-1)
      else v!i

When both both incx and incy have a magnitude of 1 and the same sign, then this corresponds to copying the first n elements of incx into the first n elements of y.

dlewissandy commented 7 years ago

BLAS Implementation notes:

dlewissandy commented 7 years ago

Design Choices:

dlewissandy commented 7 years ago

I have created a branch (ISSUE-34) and begun implementation.

dlewissandy commented 7 years ago

I have collected and attached the benchmark data for the scopy function Archive.zip

dlewissandy commented 7 years ago

In the case when INCX and INCY are 1, a linear model fits both the unsafe and streamed implementation. The summary results follow:

STREAM

lm(formula = STREAM ~ 0 + N, data = scopy[scopy$INC == 1, ])

Residuals:
     Min       1Q   Median       3Q      Max
-0.08339  0.03416  0.03743  0.04536  0.09703

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 2.152e-04  1.147e-06   187.6   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.04712 on 38 degrees of freedom
Multiple R-squared:  0.9989,    Adjusted R-squared:  0.9989
F-statistic: 3.52e+04 on 1 and 38 DF,  p-value: < 2.2e-16

UNSAFE BLAS

lm(formula = UNSAFE ~ 0 + N, data = scopy[scopy$INC == 1, ])

Residuals:
     Min       1Q   Median       3Q      Max
-0.93515 -0.15444  0.02680  0.04407  0.97555

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 5.748e-04  7.474e-06   76.91   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3071 on 38 degrees of freedom
Multiple R-squared:  0.9936,    Adjusted R-squared:  0.9934
F-statistic:  5915 on 1 and 38 DF,  p-value: < 2.2e-16

Asymptotically as N gets large, the relative execution time for the streaming algorithm should approach

2.152e-04
----------  = 37%
5.748e-04
dlewissandy commented 7 years ago

In the case when INCX or INCY not 1, a multi-linear model fits both the unsafe and streamed implementation. The summary results follow:

STREAM

lm(formula = STREAM ~ 0 + N + Z, data = scopy[scopy$INC != 1,
    ])

Residuals:
     Min       1Q   Median       3Q      Max
-0.99223 -0.14711  0.04556  0.06993  1.39199

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 5.793e-03  1.467e-04   39.48   <2e-16 ***
Z 3.721e-04  1.046e-05   35.56   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3822 on 49 degrees of freedom
Multiple R-squared:  0.9961,    Adjusted R-squared:  0.996
F-statistic:  6281 on 2 and 49 DF,  p-value: < 2.2e-16

UNSAFE BLAS

lm(formula = UNSAFE ~ 0 + N + Z, data = scopy[scopy$INC != 1,
    ])

Residuals:
    Min      1Q  Median      3Q     Max
-0.6180 -0.1309 -0.0331  0.0333  1.0308

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 2.747e-03  9.646e-05   28.48   <2e-16 ***
Z 2.297e-04  6.879e-06   33.38   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2513 on 49 degrees of freedom
Multiple R-squared:  0.9943,    Adjusted R-squared:  0.9941
F-statistic:  4271 on 2 and 49 DF,  p-value: < 2.2e-16

In the limit as both N and INC become large, the relative execution time for the stream implementation approaches:

3.721e-04
---------- = 160%
2.297e-04

This is well within the 100x target for this project milestone.

dlewissandy commented 7 years ago

Closed as DONE after approval and merge of pull request #57