dlewissandy / lambda-blas

Native Haskell implementation of the BLAS library
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

As a developer of scientific software, I need a native Haskell implementation of the srot function so that i can compute plane rotations of points in a pure, type-safe, thread-safe manner. #37

Closed dlewissandy closed 7 years ago

dlewissandy commented 7 years ago

Documentation for the srot function can be found at BLAS.

dlewissandy commented 7 years ago

I have collected the performance data. see attached files srot'.zip

dlewissandy commented 7 years ago

BLAS implementation notes:

dlewissandy commented 7 years ago

Design Choices:

dlewissandy commented 7 years ago

In the case when INCX and INCY are both 1, a linear model fits both the unsafe and streamed implementation. The summary results follow:

STREAM

lm(formula = STREAM ~ 0 + N, data = dat[dat$INC == 1, ])

Residuals:
    Min      1Q  Median      3Q     Max
-9.3427 -0.8482  0.1529  0.1825 12.6360

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 1.203e-02  7.375e-05   163.1   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.03 on 38 degrees of freedom
Multiple R-squared:  0.9986,    Adjusted R-squared:  0.9985
F-statistic: 2.661e+04 on 1 and 38 DF,  p-value: < 2.2e-16

UNSAFE BLAS

lm(formula = UNSAFE ~ 0 + N, data = dat[dat$INC == 1, ])

Residuals:
     Min       1Q   Median       3Q      Max
-0.76733 -0.06488  0.11720  0.14151  0.80901

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
N 9.910e-04  6.426e-06   154.2   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.264 on 38 degrees of freedom
Multiple R-squared:  0.9984,    Adjusted R-squared:  0.9984
F-statistic: 2.379e+04 on 1 and 38 DF,  p-value: < 2.2e-16

Asymptotically as N gets large, the relative execution time for the streaming algorithm should approach

1.203E-2
--------- = 12.1x
9.910E-4
dlewissandy commented 7 years ago

In the case when INCX or INCY not 1, a multi-linear model fits both the unsafe and streamed implementation. The summary results follow:

STREAM

lm(formula = STREAM ~ 0 + Z + N, data = dat[dat$INC != 1, ])

Residuals:
    Min      1Q  Median      3Q     Max
-1.3373 -0.4261  0.1132  0.1987  1.6970

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
Z 7.892e-04  1.499e-05   52.66   <2e-16 ***
N 1.512e-02  2.101e-04   71.94   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5475 on 49 degrees of freedom
Multiple R-squared:  0.9986,    Adjusted R-squared:  0.9985
F-statistic: 1.737e+04 on 2 and 49 DF,  p-value: < 2.2e-16

UNSAFE BLAS

lm(formula = UNSAFE ~ 0 + Z + N, data = dat[dat$INC != 1, ])

Residuals:
     Min       1Q   Median       3Q      Max
-0.55717 -0.15917  0.03626  0.12351  0.62059

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
Z 4.969e-04  6.669e-06   74.50   <2e-16 ***
N 4.076e-03  9.352e-05   43.59   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2436 on 49 degrees of freedom
Multiple R-squared:  0.9984,    Adjusted R-squared:  0.9984
F-statistic: 1.569e+04 on 2 and 49 DF,  p-value: < 2.2e-16

In the limit as both N and INC become large, the relative execution time for the stream implementation approaches:

7.892E-4
---------- = 159%
4.969E-4

This is well within the 100x target for this project milestone.

dlewissandy commented 7 years ago

Completed as DONE after approval of pull request #62