utterances-bot commented 1 year ago

Josiah Parry - Feeling rusty: counting characters

https://josiahparry.com/posts/2023-04-13-counting-chars/

Ilia-Kosenkov commented 1 year ago

To even out the competition, here is a version of the same algorithm written for {cpp11} (praise GPT-4 for rewriting C++ for me):

#include <cpp11.hpp>
#include <vector>
#include <string>

[[cpp11::register]]
cpp11::writable::integers count_seq_chars_to_ref_cpp11(cpp11::strings x, const std::string& y) {
  cpp11::writable::integers result(x.size());

  for (int i = 0; i < x.size(); ++i) {
    std::string xi_str(x[i]);
    auto it_x = xi_str.begin();
    auto it_y = y.begin();

    while (it_x != xi_str.end() && it_y != y.end() && *it_x == *it_y) {
      ++it_x;
      ++it_y;
    }

    result[i] = std::distance(xi_str.begin(), it_x);
  }

  return result;
}

The main difference is that it uses special containers (like strings), that are optimized for R memory consumption and are zero-copy.

Here is the benchmark outputs:

bench::mark(
  cpp = count_seq_chars_to_ref_cpp(sample_strings, "abcd123"),
  cpp11 = count_seq_chars_to_ref_cpp11(sample_strings, "abcd123"),
  rust = count_seq_chars_to_ref_rust(sample_strings, "abcd123")
) |> print()
#> # A tibble: 3 x 13
#>   expression      min  median itr/s~1 mem_a~2 gc/se~3 n_itr  n_gc total~4 result
#>   <bch:expr> <bch:tm> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list>
#> 1 cpp          2.82ms  3.04ms    322.   784KB    4.12   156     2   485ms <dbl> 
#> 2 cpp11        6.95ms  7.43ms    132.   391KB   13.6     58     6   441ms <int> 
#> 3 rust         1.64ms  1.76ms    560.   391KB    4.10   273     2   488ms <int> 
#> # ... with 3 more variables: memory <list>, time <list>, gc <list>, and
#> #   abbreviated variable names 1: `itr/sec`, 2: mem_alloc, 3: `gc/sec`,
#> #   4: total_time

^{Windows 11 10.0.22621 x64, i7-11700KF, R 4.2.3}

{rextendr} has the same memory footprint as {cpp11} when using proper wrapper types, and it seems to be at least as fast as reference C++ implementation.

JosiahParry commented 1 year ago

This has been sped up even more!

fn count_(x: Strings, y: &str) -> Integers {
    x.iter()
        .map(|xi| 
            xi.bytes().zip(y.bytes())
                .take_while(|(a, b)| a == b)
                .count()
        ).map(|x| (x as i32).into())
        .collect()
}

Two tricks here, the zip can be separated into two values from the closure. The second thing shoutout @cgmossa for suggesting to use bytes() instead of chars.

This shaves off ~0.2ms per iteration.

#> A tibble: 2 × 13
#>   expression    min median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴ result memory     time      
#>  <bch:expr> <bch:> <bch:>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list> <list>     <list>    
#> 1 chars      1.36ms 1.41ms    701.   391KB    7.30  9897   103   14.1s <int>  <Rprofmem> <bench_tm>
#> 2 bytes2     1.16ms 1.21ms    818.   391KB    8.18  9901    99   12.1s <int>  <Rprofmem> <bench_tm>

jonocarroll commented 1 year ago

Great article! (I only just saw it now, sorry - I don't suppose you have an RSS feed, do you?).

A slightly faster R version might be

str_overlap <- function(x, ref) {
  sapply(x, \(y) {
    s <- strsplit(y, "")[[1]]
    r <- strsplit(ref, "")[[1]]
    len <- 0
    comp <- TRUE
    # assuming x and ref are always the same length
    while (len <= length(s) && comp) {
      len <- len + 1
      comp <- s[len] == r[len]
    }
    len - 1
  }, USE.NAMES = FALSE)
}

which can be further improved automatically by wrapping it in

f <- compiler::cmpfun(f)

On my machine this is at least 2.5x faster than the R version you have, and an additional 20% faster when compiled. It's a tired snail compared to any of the rust versions, of course. I used this as a reason to try out rextendr for myself and the improvement tips are fantastic.

JosiahParry commented 1 year ago

This is nuts! I dind't know about the compiler package or the fact that R can enable JIT (is it enabled by default?).

Notes to self to revisit:

jonocarroll commented 1 year ago

JIT is supposedly on by default (since 3.4) depending on your settings, but I'm out of my depths regarding how that impacts these results. I believe it kicks in after a couple (few?) function invocations. I can't speak to how that differs from what cmpfun() does, but perhaps it at least makes it 'definitely compiled'.

JosiahParry / quarto-site

posts/2023-04-13-counting-chars/ #6

Josiah Parry - Feeling rusty: counting characters