Open vjontiveros opened 1 year ago
Can you provide an explicit example?
Vicente J. Ontiveros @.***> schrieb am Mi. 2. Aug. 2023 um 15:55:
I just made a microbenchmark of sample.int versus dqsample.int and while dqsample.int is faster when no argument prob is specified, the performance drops considerably (about 10x slower than sample.int for small prob vectors) if the argument is supplied. Could this function be optimised?
— Reply to this email directly, view it on GitHub https://github.com/daqana/dqrng/issues/52, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUIISZQO6P6GIQOEHR5K33XTJL5BANCNFSM6AAAAAA3BKEQ5U . You are receiving this because you are subscribed to this thread.Message ID: @.***>
To be more explicit, I ran a quick test myself:
library(dqrng)
m <- 1e6
n <- 1e4
prob <- dqrunif(m)
bm <- bench::mark(sample.int(m, n, replace = TRUE, prob = prob),
dqsample.int(m, n, replace = TRUE, prob = prob),
check = FALSE)
bm[, 1:4]
#> # A tibble: 2 × 4
#> expression min median `itr/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl>
#> 1 sample.int(m, n, replace = TRUE, prob = prob) 13.24ms 13.95ms 70.8
#> 2 dqsample.int(m, n, replace = TRUE, prob = prob) 2.49ms 2.54ms 388.
m <- 1e1
prob <- dqrunif(m)
bm <- bench::mark(sample.int(m, n, replace = TRUE, prob = prob),
dqsample.int(m, n, replace = TRUE, prob = prob),
check = FALSE)
bm[, 1:4]
#> # A tibble: 2 × 4
#> expression min median `itr/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl>
#> 1 sample.int(m, n, replace = TRUE, prob = prob) 139µs 149µs 6691.
#> 2 dqsample.int(m, n, replace = TRUE, prob = prob) 526µs 564µs 1763.
Created on 2023-08-04 with reprex v2.0.2
While dqsample.int
is much faster for large populations, this reverts for smaller ones. However, I do not see a 10x difference. So a concrete example of your tests would still help.
Anyway, looks like I should also investigate the bisection method discussed in #18 or inspect why my original investigations at https://stubner.me/2022/12/roulette-wheel-selection-for-dqrng/ showed comparable performance for small populations.
Sure, I just hope it's not something related to my own machine (running a old Intel Core i5 on Ubuntu 20.04). In my typical usage, sample.int is faster for very small populations, but it is true that the difference becomes smaller and smaller increasing the number of populations. And many thanks for your useful package, it's going to cut running time in my stochastic models for sure.
I did this:
> probs10 <- runif(10)
> probs100 <- runif(100)
> probs1000 <- runif(1000)
> probs10000 <- runif(10000)
>
> microbenchmark::microbenchmark("base10" = sample.int(10, 1, prob = probs10),
+ "dqrng10" = dqrng::dqsample.int(10, 1, prob = probs10),
+ "base100" = sample.int(100, 1, prob = probs100),
+ "dqrng100" = dqrng::dqsample.int(100, 1, prob = probs100),
+ "base1000" = sample.int(1000, 1, prob = probs1000),
+ "dqrng1000" = dqrng::dqsample.int(1000, 1, prob = probs1000),
+ "base10000" = sample.int(10000, 1, prob = probs10000),
+ "dqrng10000" = dqrng::dqsample.int(10000, 1, prob = probs10000),
+ times = 10000)
Unit: microseconds
expr min lq mean median uq max neval
base10 3.700 4.7330 6.025951 5.1540 5.7270 4452.855 10000
dqrng10 70.392 76.5140 84.831459 79.8555 83.4410 537.470 10000
base100 6.245 8.9940 10.307123 10.0465 10.8930 69.717 10000
dqrng100 74.215 81.0415 90.359994 84.7135 88.4810 4679.389 10000
base1000 71.343 74.6655 79.732079 75.7230 76.8580 6092.578 10000
dqrng1000 140.142 146.7970 158.571762 150.6020 154.5085 6375.078 10000
base10000 917.585 933.5860 967.388178 937.4065 970.5880 9099.942 10000
dqrng10000 987.884 1007.3070 1054.972957 1013.1880 1047.8260 66884.217 10000
Thanks! I have taken the liberty to change the formatting of your code by surrounding it with three backticks.
This way it is better readable. Using your code (and using the reprex
package for formatting) I get on a i7 CPU:
probs10 <- runif(10)
probs100 <- runif(100)
probs1000 <- runif(1000)
probs10000 <- runif(10000)
microbenchmark::microbenchmark("base10" = sample.int(10, 1, prob = probs10),
"dqrng10" = dqrng::dqsample.int(10, 1, prob = probs10),
"base100" = sample.int(100, 1, prob = probs100),
"dqrng100" = dqrng::dqsample.int(100, 1, prob = probs100),
"base1000" = sample.int(1000, 1, prob = probs1000),
"dqrng1000" = dqrng::dqsample.int(1000, 1, prob = probs1000),
"base10000" = sample.int(10000, 1, prob = probs10000),
"dqrng10000" = dqrng::dqsample.int(10000, 1, prob = probs10000),
times = 10000)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> base10 2.176 2.8475 3.319717 3.1260 3.499 32.194 10000
#> dqrng10 1.601 1.9290 6.085725 2.1330 2.464 37518.282 10000
#> base100 3.061 4.7515 6.133829 6.0005 6.743 1603.356 10000
#> dqrng100 1.735 2.0720 2.474106 2.2750 2.625 28.024 10000
#> base1000 25.853 46.4780 50.104840 50.1280 53.542 1653.449 10000
#> dqrng1000 3.436 3.9220 4.342348 4.1450 4.486 33.646 10000
#> base10000 590.307 642.3895 660.203399 654.4490 667.192 6604.411 10000
#> dqrng10000 20.362 22.0310 22.866362 22.5240 23.263 57.284 10000
Created on 2023-08-04 with reprex v2.0.2
Here for the Median, dqsample.int
is always faster than sample.int
. I will try with other CPUs later.
How did you install dqrng
?
Wow, such a change. Probably I need to get a new computer... I installed dqrng 0.3.0 from cran, using the typical install.packages. It compiled succesfully. But now I have seen something odd. I get warnings saying:
In dqrng::dqsample.int(100, 1, prob = probs100) :
Using 'prob' is not supported yet. Using default 'sample.int'.
which has no sense...
The timing differences for the R functions are noticeable but not that extreme. I would not get a new computer just now. The differences for dqrng
are extreme, though. Maybe you used the development version together with devtools::load_all()
? That compiles w/o optimization and would explain the extreme difference. Instead you can install the latest development version with devtools::install()
from a working copy or via
options(repos = c(
rstub = 'https://rstub.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
install.packages('dqrng')
The current CRAN version 0.3.0 does not have have weighted sampling, which is why you get this warning.
I installed latest version and got similar results as yours. Many thanks for the help and sorry for the troubles!
> library(dqrng)
> probs10 <- runif(10)
> probs100 <- runif(100)
> probs1000 <- runif(1000)
> probs10000 <- runif(10000)
> microbenchmark::microbenchmark("base10" = sample.int(10, 1, prob = probs10),
+ "dqrng10" = dqrng::dqsample.int(10, 1, prob = probs10),
+ "base100" = sample.int(100, 1, prob = probs100),
+ "dqrng100" = dqrng::dqsample.int(100, 1, prob = probs100),
+ "base1000" = sample.int(1000, 1, prob = probs1000),
+ "dqrng1000" = dqrng::dqsample.int(1000, 1, prob = probs1000),
+ "base10000" = sample.int(10000, 1, prob = probs10000),
+ "dqrng10000" = dqrng::dqsample.int(10000, 1, prob = probs10000),
+ times = 10000)
Unit: microseconds
expr min lq mean median uq max neval
base10 3.753 4.5990 5.651133 4.9880 5.5900 153.807 10000
dqrng10 4.474 5.2730 7.003588 5.7930 7.0650 136.767 10000
base100 6.456 8.6000 10.915304 9.8830 10.7410 2656.511 10000
dqrng100 4.729 5.5805 7.348371 6.0895 7.4020 151.335 10000
base1000 70.510 74.4630 79.038821 75.6525 79.0570 2820.972 10000
dqrng1000 7.547 8.4450 10.334555 9.0000 10.3650 146.447 10000
base10000 916.296 934.8985 980.620371 952.5605 995.8445 5733.484 10000
dqrng10000 35.837 36.7360 40.042864 37.5065 39.6050 341.567 10000
You are welcome. Actually I would like to keep the issue open since I did see some strange effects. For a population of 10 drawing one 1 item dqsample.int
is faster than sample.int
. But for 10^4 items, it is actually slower. That is surprising and I would like to look into it.
Unfortunately these tests are only scratching the surface. The performance from probabilistic sampling gets really bad, if one (or few) possibilities have much higher weights than the others. To quantify this, one can use max_weight / average_weight
, which is a measure for how many tries one needs before a draw is successfull.
I am not sure yet what a good cut-off point is go for a different algorithm. Or if it would be better to use the alias method right away, c.f. https://www.keithschwarz.com/darts-dice-coins/.
For now I will have to back out weighted sampling and reopen #18 and #45.
I just made a microbenchmark of sample.int versus dqsample.int and while dqsample.int is faster when no argument prob is specified, the performance drops considerably (about 10x slower than sample.int for small prob vectors) if the argument is supplied. Could this function be optimised?