PolMine / RcppCWB

'Rcpp' Bindings for the 'Corpus Workbench' (CWB)
Other
2 stars 3 forks source link

The cost of not passing pointers #52

Closed ablaette closed 2 years ago

ablaette commented 2 years ago

The new package version (v0.5.0.9004) includes experimental, unexported functions RcppCWB:::.cl_find_corpus(), RcppCWB:::.cl_new_attribute() and RcppCWB:::.cpos_to_id() that will rely on passing pointers rather than finding and creating corpus pointers and attribute pointers at the C++/C level again and again. There is a performance gain, but it is not not really considerable.

library(RcppCWB)

c <- RcppCWB:::.cl_find_corpus(corpus = "reuters", registry = Sys.getenv("CORPUS_REGISTRY"))
p <- RcppCWB:::.cl_new_attribute(c, "word", 1)

system.time(
  lapply(
    1:5000,
    function(i) RcppCWB:::.cpos_to_id(p, 0L:4000L)
  )
)
# user  system elapsed 
# 1.383   0.013   1.396 

system.time(
  lapply(
    1:5000,
    function(i) cl_cpos2id(
      corpus = "REUTERS",
      p_attribute = "word",
      cpos = 0L:4000L
    )
  )
)
# user  system elapsed 
# 1.545   0.043   1.589 

system.time(
  lapply(
    1:5000,
    function(i)
      RcppCWB:::.cl_cpos2id(
        corpus = "REUTERS",
        p_attribute = "word",
        registry = Sys.getenv("CORPUS_REGISTRY"),
        cpos = 0L:4000L
      )
  )
)
# user  system elapsed 
# 1.387   0.009   1.397
ablaette commented 2 years ago

I implemented variants of the CL functions that will accept pointers as input. Maybe this is not about performance, but a path to being able to write more concise code.