Closed dlewissandy closed 7 years ago
I have collected the performance data. see attached files srot'.zip
BLAS implementation notes:
Design Choices:
In the case when INCX and INCY are both 1, a linear model fits both the unsafe and streamed implementation. The summary results follow:
lm(formula = STREAM ~ 0 + N, data = dat[dat$INC == 1, ])
Residuals:
Min 1Q Median 3Q Max
-9.3427 -0.8482 0.1529 0.1825 12.6360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
N 1.203e-02 7.375e-05 163.1 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.03 on 38 degrees of freedom
Multiple R-squared: 0.9986, Adjusted R-squared: 0.9985
F-statistic: 2.661e+04 on 1 and 38 DF, p-value: < 2.2e-16
lm(formula = UNSAFE ~ 0 + N, data = dat[dat$INC == 1, ])
Residuals:
Min 1Q Median 3Q Max
-0.76733 -0.06488 0.11720 0.14151 0.80901
Coefficients:
Estimate Std. Error t value Pr(>|t|)
N 9.910e-04 6.426e-06 154.2 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.264 on 38 degrees of freedom
Multiple R-squared: 0.9984, Adjusted R-squared: 0.9984
F-statistic: 2.379e+04 on 1 and 38 DF, p-value: < 2.2e-16
Asymptotically as N gets large, the relative execution time for the streaming algorithm should approach
1.203E-2
--------- = 12.1x
9.910E-4
In the case when INCX or INCY not 1, a multi-linear model fits both the unsafe and streamed implementation. The summary results follow:
lm(formula = STREAM ~ 0 + Z + N, data = dat[dat$INC != 1, ])
Residuals:
Min 1Q Median 3Q Max
-1.3373 -0.4261 0.1132 0.1987 1.6970
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Z 7.892e-04 1.499e-05 52.66 <2e-16 ***
N 1.512e-02 2.101e-04 71.94 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5475 on 49 degrees of freedom
Multiple R-squared: 0.9986, Adjusted R-squared: 0.9985
F-statistic: 1.737e+04 on 2 and 49 DF, p-value: < 2.2e-16
lm(formula = UNSAFE ~ 0 + Z + N, data = dat[dat$INC != 1, ])
Residuals:
Min 1Q Median 3Q Max
-0.55717 -0.15917 0.03626 0.12351 0.62059
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Z 4.969e-04 6.669e-06 74.50 <2e-16 ***
N 4.076e-03 9.352e-05 43.59 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2436 on 49 degrees of freedom
Multiple R-squared: 0.9984, Adjusted R-squared: 0.9984
F-statistic: 1.569e+04 on 2 and 49 DF, p-value: < 2.2e-16
In the limit as both N and INC become large, the relative execution time for the stream implementation approaches:
7.892E-4
---------- = 159%
4.969E-4
This is well within the 100x target for this project milestone.
Completed as DONE after approval of pull request #62
Documentation for the
srot
function can be found at BLAS.