Closed dlewissandy closed 7 years ago
This function has the mathematical specification:
forall n>0, 0<i<length sy
(sscal n sx incx sy incy)!i =
if (i `mod` incy) == 0 && i `div` incy<n
then if ( sign incx == sign incy )
then u!(i*incx)
else u!((1-n+i )*incx-1)
else v!i
When both both incx and incy have a magnitude of 1 and the same sign, then this corresponds to copying the first n elements of incx into the first n elements of y.
BLAS Implementation notes:
Design Choices:
I have created a branch (ISSUE-34) and begun implementation.
I have collected and attached the benchmark data for the scopy function Archive.zip
In the case when INCX and INCY are 1, a linear model fits both the unsafe and streamed implementation. The summary results follow:
STREAM
lm(formula = STREAM ~ 0 + N, data = scopy[scopy$INC == 1, ])
Residuals:
Min 1Q Median 3Q Max
-0.08339 0.03416 0.03743 0.04536 0.09703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
N 2.152e-04 1.147e-06 187.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04712 on 38 degrees of freedom
Multiple R-squared: 0.9989, Adjusted R-squared: 0.9989
F-statistic: 3.52e+04 on 1 and 38 DF, p-value: < 2.2e-16
UNSAFE BLAS
lm(formula = UNSAFE ~ 0 + N, data = scopy[scopy$INC == 1, ])
Residuals:
Min 1Q Median 3Q Max
-0.93515 -0.15444 0.02680 0.04407 0.97555
Coefficients:
Estimate Std. Error t value Pr(>|t|)
N 5.748e-04 7.474e-06 76.91 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3071 on 38 degrees of freedom
Multiple R-squared: 0.9936, Adjusted R-squared: 0.9934
F-statistic: 5915 on 1 and 38 DF, p-value: < 2.2e-16
Asymptotically as N gets large, the relative execution time for the streaming algorithm should approach
2.152e-04
---------- = 37%
5.748e-04
In the case when INCX or INCY not 1, a multi-linear model fits both the unsafe and streamed implementation. The summary results follow:
STREAM
lm(formula = STREAM ~ 0 + N + Z, data = scopy[scopy$INC != 1,
])
Residuals:
Min 1Q Median 3Q Max
-0.99223 -0.14711 0.04556 0.06993 1.39199
Coefficients:
Estimate Std. Error t value Pr(>|t|)
N 5.793e-03 1.467e-04 39.48 <2e-16 ***
Z 3.721e-04 1.046e-05 35.56 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3822 on 49 degrees of freedom
Multiple R-squared: 0.9961, Adjusted R-squared: 0.996
F-statistic: 6281 on 2 and 49 DF, p-value: < 2.2e-16
UNSAFE BLAS
lm(formula = UNSAFE ~ 0 + N + Z, data = scopy[scopy$INC != 1,
])
Residuals:
Min 1Q Median 3Q Max
-0.6180 -0.1309 -0.0331 0.0333 1.0308
Coefficients:
Estimate Std. Error t value Pr(>|t|)
N 2.747e-03 9.646e-05 28.48 <2e-16 ***
Z 2.297e-04 6.879e-06 33.38 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2513 on 49 degrees of freedom
Multiple R-squared: 0.9943, Adjusted R-squared: 0.9941
F-statistic: 4271 on 2 and 49 DF, p-value: < 2.2e-16
In the limit as both N and INC become large, the relative execution time for the stream implementation approaches:
3.721e-04
---------- = 160%
2.297e-04
This is well within the 100x target for this project milestone.
Closed as DONE after approval and merge of pull request #57
Documentation for the scopy function can be found at BLAS.