Closed chainsawriot closed 10 months ago
051869f is 3x slower
require(quanteda); require(quanteda.proximity)
#> Loading required package: quanteda
#> Package version: 3.3.1
#> Unicode version: 14.0
#> ICU version: 70.1
#> Parallel computing: 8 of 8 threads used.
#> See https://quanteda.io for tutorials and examples.
#> Loading required package: quanteda.proximity
toks <- data_corpus_inaugural %>% tokens()
bench::mark(tokens_proximity(toks, c("a")))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:t> <bch:> <dbl> <bch:byt> <dbl>
#> 1 "tokens_proximity(toks, c(\"a\"))" 94.7ms 107ms 7.65 159MB 44.0
Created on 2023-11-21 with reprex v2.0.2
789d1fb 2x
Given this introduces more functionalities (phrase etc), I think it should be enough (although further optz is certainly possible).
require(quanteda); require(quanteda.proximity)
#> Loading required package: quanteda
#> Package version: 3.3.1
#> Unicode version: 14.0
#> ICU version: 70.1
#> Parallel computing: 8 of 8 threads used.
#> See https://quanteda.io for tutorials and examples.
#> Loading required package: quanteda.proximity
toks <- data_corpus_inaugural %>% tokens()
bench::mark(tokens_proximity(toks, c("a")))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:t> <bch:> <dbl> <bch:byt> <dbl>
#> 1 "tokens_proximity(toks, c(\"a\"))" 53.1ms 63.2ms 13.1 98.9MB 56.0
bench::mark(quanteda::index(toks, c("a")))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:> <dbl> <bch:byt> <dbl>
#> 1 "quanteda::index(toks, c(\"a\"))" 5.22ms 5.59ms 175. 2.22MB 13.3
Created on 2023-11-21 with reprex v2.0.2
20x slower than #26 recorded in #20 by @schochastics
Several possibilities
Created on 2023-11-21 with reprex v2.0.2