The new package version (v0.5.0.9004) includes experimental, unexported functions RcppCWB:::.cl_find_corpus(), RcppCWB:::.cl_new_attribute() and RcppCWB:::.cpos_to_id() that will rely on passing pointers rather than finding and creating corpus pointers and attribute pointers at the C++/C level again and again. There is a performance gain, but it is not not really considerable.
library(RcppCWB)
c <- RcppCWB:::.cl_find_corpus(corpus = "reuters", registry = Sys.getenv("CORPUS_REGISTRY"))
p <- RcppCWB:::.cl_new_attribute(c, "word", 1)
system.time(
lapply(
1:5000,
function(i) RcppCWB:::.cpos_to_id(p, 0L:4000L)
)
)
# user system elapsed
# 1.383 0.013 1.396
system.time(
lapply(
1:5000,
function(i) cl_cpos2id(
corpus = "REUTERS",
p_attribute = "word",
cpos = 0L:4000L
)
)
)
# user system elapsed
# 1.545 0.043 1.589
system.time(
lapply(
1:5000,
function(i)
RcppCWB:::.cl_cpos2id(
corpus = "REUTERS",
p_attribute = "word",
registry = Sys.getenv("CORPUS_REGISTRY"),
cpos = 0L:4000L
)
)
)
# user system elapsed
# 1.387 0.009 1.397
I implemented variants of the CL functions that will accept pointers as input. Maybe this is not about performance, but a path to being able to write more concise code.
The new package version (v0.5.0.9004) includes experimental, unexported functions
RcppCWB:::.cl_find_corpus()
,RcppCWB:::.cl_new_attribute()
andRcppCWB:::.cpos_to_id()
that will rely on passing pointers rather than finding and creating corpus pointers and attribute pointers at the C++/C level again and again. There is a performance gain, but it is not not really considerable.