Open utterances-bot opened 1 year ago
To even out the competition, here is a version of the same algorithm written for {cpp11}
(praise GPT-4 for rewriting C++ for me):
#include <cpp11.hpp>
#include <vector>
#include <string>
[[cpp11::register]]
cpp11::writable::integers count_seq_chars_to_ref_cpp11(cpp11::strings x, const std::string& y) {
cpp11::writable::integers result(x.size());
for (int i = 0; i < x.size(); ++i) {
std::string xi_str(x[i]);
auto it_x = xi_str.begin();
auto it_y = y.begin();
while (it_x != xi_str.end() && it_y != y.end() && *it_x == *it_y) {
++it_x;
++it_y;
}
result[i] = std::distance(xi_str.begin(), it_x);
}
return result;
}
The main difference is that it uses special containers (like strings
), that are optimized for R memory consumption and are zero-copy.
Here is the benchmark outputs:
bench::mark(
cpp = count_seq_chars_to_ref_cpp(sample_strings, "abcd123"),
cpp11 = count_seq_chars_to_ref_cpp11(sample_strings, "abcd123"),
rust = count_seq_chars_to_ref_rust(sample_strings, "abcd123")
) |> print()
#> # A tibble: 3 x 13
#> expression min median itr/s~1 mem_a~2 gc/se~3 n_itr n_gc total~4 result
#> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:b> <dbl> <int> <dbl> <bch:t> <list>
#> 1 cpp 2.82ms 3.04ms 322. 784KB 4.12 156 2 485ms <dbl>
#> 2 cpp11 6.95ms 7.43ms 132. 391KB 13.6 58 6 441ms <int>
#> 3 rust 1.64ms 1.76ms 560. 391KB 4.10 273 2 488ms <int>
#> # ... with 3 more variables: memory <list>, time <list>, gc <list>, and
#> # abbreviated variable names 1: `itr/sec`, 2: mem_alloc, 3: `gc/sec`,
#> # 4: total_time
Windows 11 10.0.22621 x64, i7-11700KF, R 4.2.3
{rextendr}
has the same memory footprint as {cpp11}
when using proper wrapper types, and it seems to be at least as fast as reference C++ implementation.
This has been sped up even more!
fn count_(x: Strings, y: &str) -> Integers {
x.iter()
.map(|xi|
xi.bytes().zip(y.bytes())
.take_while(|(a, b)| a == b)
.count()
).map(|x| (x as i32).into())
.collect()
}
Two tricks here, the zip can be separated into two values from the closure. The second thing shoutout @cgmossa for suggesting to use bytes()
instead of chars.
This shaves off ~0.2ms per iteration.
#> A tibble: 2 × 13
#> expression min median itr/s…¹ mem_a…² gc/se…³ n_itr n_gc total…⁴ result memory time
#> <bch:expr> <bch:> <bch:> <dbl> <bch:b> <dbl> <int> <dbl> <bch:t> <list> <list> <list>
#> 1 chars 1.36ms 1.41ms 701. 391KB 7.30 9897 103 14.1s <int> <Rprofmem> <bench_tm>
#> 2 bytes2 1.16ms 1.21ms 818. 391KB 8.18 9901 99 12.1s <int> <Rprofmem> <bench_tm>
Great article! (I only just saw it now, sorry - I don't suppose you have an RSS feed, do you?).
A slightly faster R version might be
str_overlap <- function(x, ref) {
sapply(x, \(y) {
s <- strsplit(y, "")[[1]]
r <- strsplit(ref, "")[[1]]
len <- 0
comp <- TRUE
# assuming x and ref are always the same length
while (len <= length(s) && comp) {
len <- len + 1
comp <- s[len] == r[len]
}
len - 1
}, USE.NAMES = FALSE)
}
which can be further improved automatically by wrapping it in
f <- compiler::cmpfun(f)
On my machine this is at least 2.5x faster than the R version you have, and an additional 20% faster when compiled. It's a tired snail compared to any of the rust versions, of course. I used this as a reason to try out rextendr
for myself and the improvement tips are fantastic.
This is nuts! I dind't know about the compiler
package or the fact that R can enable JIT (is it enabled by default?).
Notes to self to revisit:
JIT is supposedly on by default (since 3.4) depending on your settings, but I'm out of my depths regarding how that impacts these results. I believe it kicks in after a couple (few?) function invocations. I can't speak to how that differs from what cmpfun()
does, but perhaps it at least makes it 'definitely compiled'.
Josiah Parry - Feeling rusty: counting characters
https://josiahparry.com/posts/2023-04-13-counting-chars/