Closed waynelapierre closed 2 years ago
I'm not sure that's very useful. The results will depend hugely on what exactly you do. In particular, are you calling back from the compiled language to R or not?
In any case, extendr is following the design principle of safety and convenience over performance, so in many applications Rcpp code may be faster, in particular if you're going back-and-forth frequently between R and Rust. On the other hand, for numerically intensive code that does many operations in compiled code, the results are comparable. (In my example below, Rust is slightly faster, but it's basically a wash.)
library("Rcpp")
library("rextendr")
cppFunction("
double computation_rcpp(unsigned int n = 1000) {
double x1 = 0.1;
double x2 = 1.5;
double z = 1;
for (unsigned int i = 1; i <= n; i++) {
z = x2*z - x1;
if (z > 10) {
z = z - 9;
}
}
return z;
}")
rust_function("fn computation_rust(n:u32) -> f64 {
let x1 = 0.1;
let x2 = 1.5;
let mut z = 1.0;
let mut i = 1;
loop {
z = x2*z - x1;
if z > 10. {
z = z - 9.;
}
i = i + 1;
if i > n {
break;
}
}
z
}", profile = "release")
#> build directory: /var/folders/b1/13gn4j655jddkfhxmtk5tsfm0000gn/T//RtmpdxqKfc/file17cfd6a138c73
computation_rcpp(100000)
#> [1] 5.552521
computation_rust(100000)
#> [1] 5.552521
microbenchmark::microbenchmark(
computation_rcpp(10),
computation_rust(10),
computation_rcpp(1000),
computation_rust(1000),
computation_rcpp(100000),
computation_rust(100000)
)
#> Unit: nanoseconds
#> expr min lq mean median uq max
#> computation_rcpp(10) 1187 1682.5 2886.35 2509.5 3425.5 10110
#> computation_rust(10) 926 1092.5 191324.87 1332.5 1645.5 18937742
#> computation_rcpp(1000) 5005 6550.0 20419.67 7069.5 7758.5 1307997
#> computation_rust(1000) 4874 5552.0 6780.06 6163.0 6737.5 32385
#> computation_rcpp(1e+05) 392206 433195.0 476588.15 465732.0 527082.0 561250
#> computation_rust(1e+05) 394237 425857.5 472266.51 463193.5 525485.5 571115
#> neval cld
#> 100 a
#> 100 ab
#> 100 a
#> 100 a
#> 100 b
#> 100 b
Created on 2021-01-06 by the reprex package (v0.3.0)
However, when calling the function many times but with a small computational workload, Rust is almost twice as fast as Rcpp. I hadn't quite expected this. @andy-thomason Looks like starting a new thread for each Rust function is not a big deal.
> microbenchmark::microbenchmark(
+ for (i in 1:10000) { computation_rcpp(1) },
+ for (i in 1:10000) { computation_rust(1) }
+ )
Unit: milliseconds
expr min lq mean median
for (i in 1:10000) { computation_rcpp(1) } 14.04551 20.24895 24.38897 23.81935
for (i in 1:10000) { computation_rust(1) } 10.32471 11.97144 13.01768 12.76082
uq max neval cld
25.79039 158.66692 100 b
13.75396 17.49498 100 a
This is good news.
It all depends on the target architecture and how vectorisable the loop is.
Godbolt gives:
example::computation_rust:
vmovsd xmm0, qword ptr [rip + .LCPI0_0]
mov eax, 1
vmovsd xmm1, qword ptr [rip + .LCPI0_1]
vmovsd xmm2, qword ptr [rip + .LCPI0_2]
vmovsd xmm3, qword ptr [rip + .LCPI0_3]
vmovsd xmm4, qword ptr [rip + .LCPI0_4]
.LBB0_1:
vmulsd xmm0, xmm0, xmm1
vaddsd xmm0, xmm0, xmm2
vaddsd xmm5, xmm0, xmm3
vcmpltsd xmm6, xmm4, xmm0
vblendvpd xmm0, xmm0, xmm5, xmm6
inc eax
cmp eax, edi
jbe .LBB0_1
ret
Which is not vectorised but has at least folded the condition into a vblendpd
.
Most of the problem is the running product which, by IEEE rules, prevents the re-ordering
of the accumulation of z
.
Certainly, LLVM's code generation is streets ahead of GCC's.
I was planning to add fast summation that allows loops to vectorise to RobjItertools.
We may have to consider a rayon
feature as time goes by.
This is a slightly better version of the loop:
pub fn computation_rust(n:u32) -> f64 {
let x1 = 0.1;
let x2 = 1.5;
let mut z = 1.0;
for i in 1..n+1 {
z = x2*z - x1;
if z > 10. {
z = z - 9.;
}
}
z
}
It allows the loop to unroll as the trip count is more predictable.
I had tried the for loop but on my machine it was slower than the C++ code. :-)
It's been a long while since this post was updated (sorry to necropost) but I'd like to share some findings on the overhead of using Rcpp -vs- Rust for returning a basic zero filled vector and matrix from both...
This gist of an R script compiles the Rust and Rcpp versions as similarly as I could make them and iterates N from 10 to 640 for length on the vectors and each dim 10x10 to 640x640 per matrix.
For example, the simple vector...
IntegerVector rcpp_zeros_intvec(int n) {
IntegerVector my_vec(n);
return my_vec;
}
fn rust_zeros_intvec(n: i32) -> Robj {
let my_vec = vec!(0; n as usize);
r!(my_vec)
}
It shows how there is a significant negative impact from the extendr handling... It's a slightly noticeable impact at N=10
N = 10
Unit: microseconds
expr min lq mean median uq max neval cld
rcpp_zeros_intvec(N) 1.713 1.923 3.097295 2.104 2.405 739.202 1000 a
rust_zeros_intvec(N) 1.673 1.913 3.294399 2.054 2.225 1149.976 1000 a
rcpp_zeros_intmat(N) 1.994 2.254 3.541405 2.455 2.815 752.757 1000 a
rust_zeros_intmat(N) 3.177 3.697 5.001572 3.988 4.288 811.630 1000 a
rcpp_zeros_intvec_dimmed(N) 1.984 2.254 3.596928 2.454 2.825 763.548 1000 a
but it doesn't take long for the impact to be very noticable
N = 640
Unit: microseconds
expr min lq mean median uq max neval cld
rcpp_zeros_intvec(N) 2.044 4.3385 11.40082 7.3290 16.0310 326.103 1000 a
rust_zeros_intvec(N) 2.925 6.6530 21.90912 21.7670 29.2960 171.127 1000 a
rcpp_zeros_intmat(N) 42.030 174.7990 749.19897 409.9275 513.1500 196941.654 1000 b
rust_zeros_intmat(N) 1227.463 1593.6380 2183.07436 1851.1095 2254.8875 21488.223 1000 c
rcpp_zeros_intvec_dimmed(N) 39.495 178.1705 534.83684 412.4355 530.6185 15211.649 1000 b
If that's from copies or from handling the owners or whatever I dont have the knowledge to know and so the guys in discord recommended for it to be posted here.
rcpp_zeros_intvec_dimmed
is included without a rust comparison because I dont yet know of a directly comparable extendr function.Which version of extendr is this using? The one currently released to crates.io is completely out of date. Unfortunately the next one is not quite ready yet for release.
In any case, it would be useful to try again with the latest version and using the new wrappers that have been written. In particular in the vector example, I'm quite certain that the rust code makes an additional copy that the C++ code doesn't make, and I think this copy can be avoided with the development version of extendr.
Matrix code probably needs more work, but it's low priority until the basic features are settled.
Which version of extendr is this using?
The latest via github extendr-api
= list(git = "https://github.com/extendr/extendr")
I can see where an additional copy might need to be made considering the 'safety' aspect of ownership but would one expect a copy to incur that much expense? Granted Rcpp is much more mature and has had many iterations of performance improvements over time. I don't have the rust experience to grok what I read without a glossary yet so all I can offer is the results of the exceedingly simple test.
Ok. In this case, for the numeric vector, you should try something like the following:
fn rust_zeros_intvec(n: i32) -> Robj {
let my_vec : Doubles = (0..n).map(|_| (0.).into()).collect();
my_vec
}
I haven't tested this, so there may be some problem with this code. Example from here: https://github.com/extendr/extendr/blob/1264ad5ae3c491247878470aceed584ba9a87844/extendr-api/src/wrapper/doubles.rs#L40
Sorry, that's making a vector of doubles, not integers. But there's comparable code for integers also. See e.g. here: https://github.com/extendr/extendr/blob/1264ad5ae3c491247878470aceed584ba9a87844/extendr-api/src/wrapper/integers.rs#L40
Here is my attempt at {cpp11}
. Our results are not impressive to be honest. I belive .into()
for every operation creates significant overhead. @andy-thomason ?
cpp11::cpp_function(
"writable::doubles make_cpp(int n) {
writable::doubles x(n);
for(int i = 0; i < n; i++)
x[i] = i;
return x;
}"
)
rextendr::rust_function(
"fn make_rust(n : i32) -> Doubles {
(0..n).map(|x| (x as f64).into()).collect::<Doubles>()
}",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
)
)
#> i build directory: 'C:/Users/[redacted]/AppData/Local/Temp/Rtmpu4wuuD/file613c76cd2e14'
#> v Writing 'C:/Users/[redacted]/AppData/Local/Temp/Rtmpu4wuuD/file613c76cd2e14/target/extendr_wrappers.R'.
c(10, 100, 1000, 10000) |>
purrr::map_dfr(
\(n) bench::mark(make_rust(n), make_cpp(n)) |>
dplyr::mutate(N = n)
) |>
dplyr::select(expression, N, median, `itr/sec`, dplyr::everything())
#> # A tibble: 8 x 7
#> expression N median `itr/sec` min mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:tm> <dbl> <bch:tm> <bch:byt> <dbl>
#> 1 make_rust(n) 10 4.2us 227781. 3.9us 0B 0
#> 2 make_cpp(n) 10 500ns 1541414. 400ns 0B 154.
#> 3 make_rust(n) 100 9.6us 101023. 9us 848B 0
#> 4 make_cpp(n) 100 700ns 1191838. 500ns 848B 0
#> 5 make_rust(n) 1000 61.7us 15951. 57.8us 7.86KB 2.02
#> 6 make_cpp(n) 1000 1.1us 601655. 900ns 7.86KB 120.
#> 7 make_rust(n) 10000 570.2us 1743. 551.9us 78.17KB 2.01
#> 8 make_cpp(n) 10000 5.5us 172149. 5us 78.17KB 259.
Created on 2021-12-21 by the reprex package (v2.0.1)
Ok so, this is a typical case of RTFI (this was a bad choice of words directed towards myself) not paying attention to the docs. By default {rextendr}
compiles dev version, so in my previous message I quoted incorrect results. Here is release
.
cpp11::cpp_function(
"writable::doubles make_cpp(int n) {
writable::doubles x(n);
for(int i = 0; i < n; i++)
x[i] = i;
return x;
}"
)
rextendr::rust_function(
"fn make_rust(n : i32) -> Doubles {
(0..n).map(|x| (x as f64).into()).collect::<Doubles>()
}",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
),
profile = "release"
)
#> i build directory: 'C:/Users/[redacted]/AppData/Local/Temp/RtmpUH51JB/file4278337c5e69'
#> v Writing 'C:/Users/[redacted]/AppData/Local/Temp/RtmpUH51JB/file4278337c5e69/target/extendr_wrappers.R'.
c(10, 100, 1000, 10000, 100000) |>
purrr::map_dfr(
\(n) bench::mark(make_rust(n), make_cpp(n)) |>
dplyr::mutate(N = n)
) |>
dplyr::select(expression, N, median, `itr/sec`, dplyr::everything())
#> # A tibble: 10 x 7
#> expression N median `itr/sec` min mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:tm> <dbl> <bch:tm> <bch:byt> <dbl>
#> 1 make_rust(n) 10 700ns 1248003. 500ns 0B 0
#> 2 make_cpp(n) 10 500ns 1527708. 400ns 0B 153.
#> 3 make_rust(n) 100 900ns 973179. 700ns 848B 0
#> 4 make_cpp(n) 100 700ns 1122990. 500ns 848B 0
#> 5 make_rust(n) 1000 1.8us 409342. 1.5us 7.86KB 40.9
#> 6 make_cpp(n) 1000 1us 853239. 900ns 7.86KB 171.
#> 7 make_rust(n) 10000 17.9us 49989. 10.9us 78.17KB 75.1
#> 8 make_cpp(n) 10000 5.6us 158774. 5us 78.17KB 254.
#> 9 make_rust(n) 100000 112.8us 7039. 106.4us 781.3KB 116.
#> 10 make_cpp(n) 100000 52.6us 16082. 48.1us 781.3KB 263.
Created on 2021-12-21 by the reprex package (v2.0.1)
Ok so, this is a typical case of RTFI. By default
{rextendr}
compiles dev version. Here isrelease
.
I added that make_rust
to my local copy of that linked gist and it doesn't fair any better or worse than the intvec.
And, yes, as shown in that gist it's set to the release profile.
I figured I'd save the reposting... but since that link seems to have been overlooked...
rextendr::rust_source(
profile = "release",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
),
N = 640
Unit: microseconds
expr min lq mean median uq max neval cld
rcpp_zeros_intvec(N) 2.084 3.9675 10.22898 6.4270 12.4890 501.043 1000 a
rust_zeros_intvec(N) 2.845 5.6905 16.50350 12.7440 24.6870 238.985 1000 a
rcpp_zeros_intmat(N) 42.460 123.5980 391.89065 364.3910 430.3995 10510.080 1000 b
rust_zeros_intmat(N) 1019.979 1570.3000 2054.20790 1779.4945 2081.5370 12317.982 1000 c
rcpp_zeros_intvec_dimmed(N) 38.522 145.0150 457.99592 377.4905 441.6065 11752.041 1000 b
make_rust(N) 3.096 4.8940 15.63082 11.3865 23.9800 113.514 1000 a
@Thell, I responded to my (now hidden) message where I ran dev
version of Rust code and got terrible results and started to complain about performance before realizing it was my fault all along. I used 'RTFI' towards myself, because I was so frustrated I made this childish mistake. This was a poor choice of words anyway. I apologize.
Speaking of your gist, here are things that may affect the result
IntegerVector my_vec(n)
initialize underlying buffer? When you do vec!(0; n as usize)
, I suspect you have an extra zeromem()
call, at least. Even if R does it itself, in Rust you have an additional operation.r!(...)
is not the best approach performance-wise. We have a new transforming API, try_from
, which can be currently enabled as #[extednr(use_try_from = true)]
. It will be enabled by default in the future release.ndarray
to see if there is any difference?It is possible that we may never reach raw C++ performance simply because we have additional overhead and type-safety, idiomatic Rust NA
and NULL
handling, and other features.
For matrix initialization, I guess this is the cost of calling the initializer function for every element. Perhaps we need to have a default constructor that takes only dimensions.
Also, I think you are missing this function https://gist.github.com/Thell/861d464c7c85ffd4c7285a894b7715f4#file-rcpp_vs_rust_zeros-r-L53-L57. It would be interesting to see if it is faster than matrix construction from dimensions and initializer function.
- Does
IntegerVector my_vec(n)
initialize underlying buffer? When you dovec!(0; n as usize)
, I suspect you have an extrazeromem()
call, at least. Even if R does it itself, in Rust you have an additional operation.
Yes, it initializes the buffer in the R memory pool. Earlier I mentioned Rcpp's having gone through performance improvements over time and this is one of the things that looks like has improved because I just ran a comparison initializing and passing back a std::vector
- Using
r!(...)
is not the best approach performance-wise. We have a new transforming API,try_from
, which can be currently enabled as#[extednr(use_try_from = true)]
. It will be enabled by default in the future release.- As I mentioned above, matrices are sub-optimal right now. And we did not really touch them lately. Could you also try using
ndarray
to see if there is any difference?
I will look into that and add those to the gist.
Regarding
I think you are missing this function
I was hoping to find an equivalent to the Rcpp vector with dim attributes set and that missing function was a false start whos only purpose is to remind me I still need to find an equivalent extendr function.
Ok, I see. Still, very useful results -- we clearly need to improve matrix performance, and likely before the next release.
So... here's what I think the c++ and rust versions of the dimmed vector -> matrix are (even though I'm not sure about returning the Result<> like that)
IntegerVector rcpp_0s_intmat_dimmedvec(int n) {
IntegerVector my_vec(n * n);
my_vec.attr("dim") = Dimension(n, n);
return my_vec;
}
fn rust_0s_intmat_dimmedvec(n: i32) -> Result<Robj> {
let my_vec = vec!(0; n as usize * n as usize);
let my_mat = r!(my_vec).set_attrib(dim_symbol(), [n as usize, n as usize]);
my_mat
}
Given the vector results from the previous benchmarks I would anticipate that since we are only setting a dim attribute that this rust function should perform competitively with the c++ function.
(this benchmark contains 2 tests for these functions... one at the size of the vector tests denoted with _v
and one at the size of the matrix tests denoted with _m
)
N = 640
Unit: microseconds
expr min lq mean median uq max neval cld
rcpp_zeros_stdvec(N) 1.913 4.4435 9.489698 6.3425 11.5020 274.539 1000 a
rust_zeros_stdvec(N) 2.675 5.3005 13.750623 8.6115 20.0980 378.368 1000 a
rcpp_zeros_intvec(N) 1.723 3.7170 8.420855 5.7010 10.6000 79.643 1000 a
rust_zeros_intvec(N) 2.655 4.9140 13.566919 8.2805 20.5245 410.924 1000 a
rust_zeros_integers(N) 3.066 6.0010 15.235590 8.9725 21.1305 362.564 1000 a
rcpp_0s_dimmed_intvec_v(N) 2.164 4.2480 9.673049 6.6425 12.1430 303.762 1000 a <-----
rust_0s_dimmed_intvec_v(N) 3.446 7.2140 17.250222 13.1300 21.9475 527.780 1000 a <-----
rcpp_zeros_intmat(N) 33.314 129.7135 301.950395 165.3915 389.0295 11359.324 1000 b
rust_zeros_intmat(N) 1068.576 1387.3960 1878.637816 1722.7925 2050.6155 18784.302 1000 d
rcpp_0s_dimmed_intvec_m(N) 32.533 136.5365 290.744776 170.4560 391.6250 8901.480 1000 b <-----
rust_0s_dimmed_intvec_m(N) 581.153 738.2130 1029.551299 889.0520 1074.6720 11822.131 1000 c <-----
As we can see looking at the <-----
marked results the rcpp versions are right inline with their intvec
and intmat
counterparts yet the rust versions aren't, with the intvec
tests being quicker and the intmat
tests being dramatically quicker.
Note that all we've done is add the dim attribute and return Result<Robj>
instead of an unwrapped Robj
.
Again, the expectation I'd have is that there should be an unnoticeable impact of adding the dim attribute.
As a sidebar test I made a R function to call the rust_zeros_intvec
and add the dims in R
r_dimmed_rust_intvec_m <- function(n) {
res <- .Call("wrap__rust_zeros_intvec", n*n, PACKAGE = "librextendr27")
dim(res) <- c(n,n)
return(res)
}
and
N = 640
Unit: microseconds
expr min lq mean median uq max neval cld
rcpp_zeros_intvec(N * N) 40.427 147.2065 238.1183 162.857 182.1690 6458.753 1000 a
rust_zeros_intvec(N * N) 573.718 636.6740 987.8065 728.209 840.6595 154891.654 1000 b
rcpp_zeros_dimmed_intvec_m(N) 42.261 149.3860 256.9951 162.561 183.3860 7796.264 1000 a
rust_zeros_dimmed_intvec_m(N) 570.191 646.4225 825.5226 740.067 842.3170 9759.256 1000 b
r_dimmed_rust_intvec_m(N) 573.408 653.4255 873.8778 744.060 860.4870 8743.313 1000 b
Given these results I'm inclined to think the issue isn't pointing directly at matrix handling.
- Using
r!(...)
is not the best approach performance-wise. We have a new transforming API,try_from
, which can be currently enabled as#[extednr(use_try_from = true)]
. It will be enabled by default in the future release.
Is this the way to use use_try_from
? I'm not sure where to see it's enabled and used... Looking at the results definitely didn't help... 😄
rextendr::rust_source(
profile = "release",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
),
code = '
/// @export
#[extendr(use_try_from = true)]
fn rust_zeros_stdvec_tf(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
extendr_module! {
mod rust_wrap;
fn rust_zeros_stdvec_tf;
}
')
> N<-640; microbenchmark(rcpp_zeros_stdvec(N*N), rust_zeros_stdvec(N*N), rust_zeros_stdvec_tf(N*N), times=1000)
Unit: microseconds
expr min lq mean median uq max neval cld
rcpp_zeros_stdvec(N * N) 147.833 190.8055 516.7946 218.6940 279.9210 160522.41 1000 a
rust_zeros_stdvec(N * N) 575.251 629.3245 881.4189 737.0965 864.4950 12784.64 1000 b
rust_zeros_stdvec_tf(N * N) 575.893 628.4230 900.2462 730.0025 857.8025 11044.42 1000 b
[edit]
Interesting... cargo rust release build on
cargo_zeros_intvec <- rust_fn(n,... = 'let (n, xn) = Rval::new_vector_integer(n.as_usize(), &mut pc); for xni in xn.iter_mut() { *xni = 0}; n')
N<-640; microbenchmark(cargo_zeros_intvec(N*N), times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
cargo_zeros_intvec(N * N) 336.964 382.865 577.2551 416.468 473.595 7546.172 1000
Bear in mind that converting from a Vec<i32>
to an integer vector has some overhead.
This is similar to returning a std::vector<int>
in Rcpp - not the fastest thing.
Using Integers
and collect
should match the C++ performance provided the compiler
flags are set to sensible values. This essentially creates an R array with Rf_allocVector
and fills it with values in an autovectorised loop. rextendr protects the R call with
a thread guard, which Rcpp does not and so will be slightly slower.
For better results - and I haven't tried it - try passing a vector to the function to fill.
The new Integers type supports DerefMut to give a mutable slice which can be
filled with a loop or rayon
for very large vectors.
An alternative is also to use ALTVEC and convert an iterator to a lazy vector.
The cost of doing this is very small, although using the vector in R will
almost certainly instantiate the entire vector on first use. Integers supports
this behaviour through from_values
.
Note that Rust's default values are very "safe" ie. very slow. To get the best result on a platform
use RUSTFLAGS="-C target-cpu=native -C opt-level=3"
or similar. We should default
rextendr::rust_source
to "release" and provide some way to set RUSTFLAGS
@andy-thomason, can you comment on the matrix performance? While 1.5 performance decrease in vector creation is a 'reasonable' tradeoff, the numbers we get when creating matrices are... discouraging.
I would have to look at the generated code to comment.
The current Matrix implementation has constructors that take a closure of |i, j| -> elem
Much depends on whether the loop will vectorise. There is usually about a 10:1 difference in performance if it does.
The version:
fn rust_0s_intmat_dimmedvec(n: i32) -> Result<Robj> {
let my_vec = vec!(0; n as usize * n as usize);
let my_mat = r!(my_vec).set_attrib(dim_symbol(), [n as usize, n as usize]);
my_mat
}
Should be done using Integers::from_value
ideally as using Vec
I might make an integers! macro along the lines of:
integers![0; 1024] integers![1, 2, 3, 4] integers![|i| i+1; 1024]
For short vectors (<1k) the R calls will dominate and we add extra overhead by making them thread safe.
We should default rextendr::rust_source to "release"
It's unclear that this is the right choice, as it makes compilation noticeably slower. For a function that is most likely going to be used interactively, it's not clear to me that this is the right tradeoff.
Bear in mind that converting from a Vec
to an integer vector has some overhead. This is similar to returning a std::vector in Rcpp - not the fastest thing.
That used to be the case but not anymore. I know I've posted quite a few results from quite a few different functions so let's cut this down to size with what I believe to be the most direct comparisons.
# Very basic Native -> R
# The idea behind each test is to have in-hand a native integer vector
# obtained via the result of some native operations on `n` that should be
# returned to R as an integer vector.
# our base line
r_fn <- function(n) {
vector(mode="integer", length=n)
}
Rcpp::cppFunction('
std::vector<int> rcpp_fn(int n) {
std::vector<int> my_vec(n, 0);
return my_vec;
}')
rextendr::rust_function('
fn rextendr_defaults(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
')
rextendr::rust_function('
fn rextendr_release(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
',
profile = "release")
rextendr::rust_function('
fn rextendr_release_devapi(n: i32) -> Vec<i32> {
let my_vec = vec!(0; n as usize);
my_vec
}
',
profile = "release",
extendr_deps = list(
`extendr-api` = list(git = "https://github.com/extendr/extendr")
))
# cargo with modification of the function required.
# Afaik it doesn't have any automatic type conversion handling.
cargo_fn <- cargo::rust_fn(n, '
let rust_n = n.as_usize();
let my_vec = vec!(0; rust_n);
let (r_vec, xr_vec) = Rval::new_vector_integer(rust_n, &mut pc);
for (relem, rustelem) in xr_vec.iter_mut().zip(my_vec) {
*relem = rustelem;
};
r_vec
')
Unit: relative
expr min lq mean median uq max neval cld
r_fn(N) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 1000 a
rcpp_fn(N) 3.583614 1.279764 1.333190 1.326804 1.453437 0.9885287 1000 a
cargo_fn(N) 3.509943 1.264357 1.281811 1.309269 1.406735 1.2350064 1000 a
rextendr_defaults(N) 307.610555 70.658629 51.260960 71.826732 71.052254 4.7738947 1000 c
rextendr_release(N) 9.555352 2.674739 2.742079 2.670020 2.785710 16.6881294 1000 b
rextendr_release_devapi(N) 10.923281 2.986696 2.674138 3.011260 3.150232 1.1583600 1000 b
Perhaps its just me not understanding what all is required for 'safety' and what a reasonable trade-off is in time but when I used PyO3 I was so impressed with Rust's improvement over my Python solution that I had to try it with R and ... well, ouch. I think a big part of that 'ouch' is that I am familiar with Rcpp and that's my goto target when I need more performance out of my R so that is what I compare to. The cargo benchmark shows that Rust can, indeed, be just as performant on getting the values back to R but I dont know what the safety checks are in=place that give all the time between that cargo benchmark and the rextendr times.
This also supports, in my opinion, having release be the default because that is going to be the first impression.
So, I've been doing some reading to further understand what @andy-thomason stated in his earlier post along with what the overhead of safety is.
From looking at cargo's new_vector_integer
it is allocating and incrementing the protect count and returning -> (Self, &'static mut [i32])
doc src which, if I am understanding what I've been reading about, is inherently unsafe in multiple ways and a primary goal of extendr is to ensure safety by protecting the SEXP behind the Robj interface and providing safe methods of interaction.
From the rust docs...
Note that all interaction with a static mut is unsafe, both reading and writing. Dealing with global mutable state requires a great deal of care.
So, if that very basic explanation is correct, the overhead between cargo's static mut reference methodology and extendr's Robj methodology is the 'cost' of safety, extendability and ease of use (ie. drop-in rust function usage) rather being the cost of a particular method of wrapping values.
Does that sound right?
Also. Vectors over a certain size are currently built as Altrep. We should probably make this optional until we can do this at zero cost.
On Fri, 24 Dec 2021, 20:32 Thell, @.***> wrote:
So, I've been doing some reading to further understand what @andy-thomason https://github.com/andy-thomason stated in his earlier post](#11 (comment) https://github.com/extendr/rextendr/issues/11#issuecomment-1000276268) along with what the overhead of safety is.
From looking at cargo's new_vector_integer it is allocating and incrementing the protect count and returning -> (Self, &'static mut [i32]) doc src https://docs.rs/roxido/0.4.2/src/roxido/r.rs.html#194-196 which, if I am understanding what I've been reading about, is inherently unsafe in multiple ways and a primary goal of extendr is to ensure safety by protecting the SEXP behind the Robj interface and providing safe methods of interaction.
From the rust docs...
Note that all interaction with a static mut is unsafe, both reading and writing. Dealing with global mutable state requires a great deal of care.
So, if that very basic explanation is correct, the overhead between cargo's static mut reference methodology and extendr's Robj methodology is the 'cost' of safety, extendability and ease of use (ie. drop-in rust function usage) rather being the cost of a particular method of wrapping values.
Does that sound right?
— Reply to this email directly, view it on GitHub https://github.com/extendr/rextendr/issues/11#issuecomment-1000927319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL36XHAP27VYNOLAKV77JTUSTKEFANCNFSM4VW5OERQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
While this is an important issue, I believe this particular conversation has gone stale. We should explore a benchmark suit, just internally also to keep track of potential performance regressions. These things tend to grow into their own projects / repos, so for now, we'll leave it at that.
Note that this PR https://github.com/extendr/extendr/pull/401 is likely to fix things.
It would be great if there could be a comparison between extender and Rcpp in terms of performance.