DorisAmoakohene / Doris-Amoakohene-atime-for-asymptotic-Timings

1 stars 2 forks source link

bench::press comparison with atime #12

Open DorisAmoakohene opened 5 days ago

DorisAmoakohene commented 5 days ago

@tdhock I want to include this to the section on Comparing bench::press and atime::atime, what do you think ? and how best can I go about it.


library(atime)

subject.size.vec <- unique(as.integer(10^seq(0, 2, l = 20)))

atime.list <- atime::atime(
  N = subject.size.vec,
  setup = {

    subject <- paste(rep("a", N), collapse = "")
    pattern <- paste(rep(c("a?", "a"), each = N), collapse = "")
  },
  times = 10,

  PCRE = regexpr(pattern, subject, perl = TRUE),
  TRE = regexpr(pattern, subject, perl = FALSE)
)

print(atime.list)

atime.list


atime list with 72 measurements for
PCRE(N=1 to 18)
TRE(N=1 to 100)

library(bench)

set.seed(42)
subject.size.vec <- unique(as.integer(10^seq(0, 2, l = 20)))  
create_subject_pattern <- function(N) {
  subject <- paste(rep("a", N), collapse = "")
  pattern <- paste(rep(c("a?", "a"), each = N), collapse = "")
  list(subject = subject, pattern = pattern)
}

results <- bench::press(
  N = subject.size.vec,                 
  engine = c("PCRE", "TRE"),            
  {

    data <- create_subject_pattern(N)
    subject <- data$subject
    pattern <- data$pattern

    bench::mark(
      min_iterations = 10,
      match = if (engine == "PCRE") {
        regexpr(pattern, subject, perl = TRUE)
      } else {
        regexpr(pattern, subject, perl = FALSE)
      }
    )
  }
)

print(results)

results

> results
# A tibble: 34 × 15
   expression     N engine      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
   <bch:expr> <int> <chr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
 1 match          1 PCRE    147.9µs  176.2µs    4929.         0B        0  2464     0
 2 match          2 PCRE    124.7µs  146.4µs    5186.         0B        0  2592     0
 3 match          3 PCRE    125.4µs  129.7µs    6853.         0B        0  3425     0
 4 match          4 PCRE    125.7µs  137.4µs    6390.         0B        0  3193     0
 5 match          5 PCRE    126.4µs  129.6µs    6786.         0B        0  3392     0
 6 match          6 PCRE    133.5µs  153.3µs    5792.         0B        0  2895     0
 7 match          8 PCRE    159.1µs  167.4µs    5206.         0B        0  2602     0
 8 match         11 PCRE    235.2µs  243.5µs    3740.         0B        0  1870     0
 9 match         14 PCRE      902µs    957µs     942.         0B        0   471     0
10 match         18 PCRE     15.1ms   16.2ms      50.4        0B        0    26     0
# ℹ 24 more rows
# ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>, time <list>,
#   gc <list>
# ℹ Use `print(n = ...)` to see more rows
tdhock commented 4 days ago

that is a good start. maybe you can use perl=TRUE or FALSE instead of engine=PCRE or TRE? That would be a good example to discuss atime_grid function too. For a more complex example of how to use bench::press see https://github.com/tdhock/atime/pull/68/commits/7aed80d21c02732180283fc64292a60ab32e6117 (probably should be simplified for paper though)

tdhock commented 4 days ago

here is an improvement / simplication of your proposal for comparing PCRE and TRE,

max.N <- 20
(subject.size.vec <- unique(as.integer(10^seq(0, log10(max.N), l = 20))))
create_subject_pattern <- function(N)list(
  subject=paste(rep("a", N), collapse = ""),
  pattern=paste(rep(c("a?", "a"), each = N), collapse = ""))
perl.values <- c(TRUE,FALSE)
press_result <- bench::press(
  N = subject.size.vec,                 
  perl = perl.values,
  with(create_subject_pattern(N), bench::mark(
    iterations = 10,
    regexpr(pattern, subject, perl = perl))))
library(ggplot2)
gg <- ggplot()+
  geom_line(aes(
    N, as.numeric(median), color=perl),
    data=press_result)+
  scale_x_log10(limits=c(NA,max.N*2))+
  scale_y_log10()
directlabels::direct.label(gg,"right.polygons")

atime_result <- atime::atime(
  N = subject.size.vec,
  setup=N.data <- create_subject_pattern(N),
  expr.list=atime::atime_grid(list(
    perl=perl.values),
    regexpr=regexpr(N.data$pattern, N.data$subject, perl = perl)))
plot(atime_result)
DorisAmoakohene commented 1 day ago

@tdhock I have the above and see result below

max.N <- 20
(subject.size.vec <- unique(as.integer(10^seq(0, log10(max.N), l = 20))))
create_subject_pattern <- function(N)list(
  subject=paste(rep("a", N), collapse = ""),
  pattern=paste(rep(c("a?", "a"), each = N), collapse = ""))
perl.values <- c(TRUE,FALSE)
press_result <- bench::press(
  N = subject.size.vec,                 
  perl = perl.values,
  with(create_subject_pattern(N), bench::mark(
    iterations = 10,
    regexpr(pattern, subject, perl = perl))))
library(ggplot2)
gg <- ggplot()+
  geom_line(aes(
    N, as.numeric(median), color=perl),
    data=press_result)+
  scale_x_log10(limits=c(NA,max.N*2))+
  scale_y_log10()
directlabels::direct.label(gg,"right.polygons")

atime_result <- atime::atime(
  N = subject.size.vec,
  setup=N.data <- create_subject_pattern(N),
  expr.list=atime::atime_grid(list(
    perl=perl.values),
    regexpr=regexpr(N.data$pattern, N.data$subject, perl = perl)))

press_result press_result

atime_result

atime_result

DorisAmoakohene commented 1 day ago

@tdhock I saw a table like this in you paper on Reshaping

pkg::function single multiple regex na.rm types list
atime::atime yes yes no yes numeric yes
atime::atime_pkg yes yes no yes numeric yes
bench::press no yes no no numeric no

single: Whether the function supports evaluating a single expression. multiple: Whether the function supports evaluating multiple expressions or combinations. regex: Whether the function uses or supports regex for selecting parameters or processing input. na.rm: Whether the function has an option to remove NA values. types: Types of data supported (e.g., numeric, character, any). list: Whether the function can handle lists or list-like structures.

what do you think about adding a this to the related work. or in the bench::press comparison with atime section.

tdhock commented 19 hours ago

what do you think about adding a this to the related work. or in the bench::press comparison with atime section.

Don't you already have bench in Table 1?