dqsample.int with arg prob sinks performance

daqana / dqrng

Fast Pseudo Random Number Generators for R

https://daqana.github.io/dqrng/

Other

42 stars 8 forks source link

dqsample.int with arg prob sinks performance #52

Open vjontiveros opened 1 year ago

vjontiveros commented 1 year ago

I just made a microbenchmark of sample.int versus dqsample.int and while dqsample.int is faster when no argument prob is specified, the performance drops considerably (about 10x slower than sample.int for small prob vectors) if the argument is supplied. Could this function be optimised?

rstub commented 1 year ago

Can you provide an explicit example?

Vicente J. Ontiveros @.***> schrieb am Mi. 2. Aug. 2023 um 15:55:

I just made a microbenchmark of sample.int versus dqsample.int and while dqsample.int is faster when no argument prob is specified, the performance drops considerably (about 10x slower than sample.int for small prob vectors) if the argument is supplied. Could this function be optimised?

— Reply to this email directly, view it on GitHub https://github.com/daqana/dqrng/issues/52, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGUIISZQO6P6GIQOEHR5K33XTJL5BANCNFSM6AAAAAA3BKEQ5U . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rstub commented 1 year ago

To be more explicit, I ran a quick test myself:

library(dqrng)

m <- 1e6
n <- 1e4
prob <- dqrunif(m)
bm <- bench::mark(sample.int(m, n, replace = TRUE, prob = prob),
                  dqsample.int(m, n, replace = TRUE, prob = prob),
                  check = FALSE)
bm[, 1:4]
#> # A tibble: 2 × 4
#>   expression                                           min   median `itr/sec`
#>   <bch:expr>                                      <bch:tm> <bch:tm>     <dbl>
#> 1 sample.int(m, n, replace = TRUE, prob = prob)    13.24ms  13.95ms      70.8
#> 2 dqsample.int(m, n, replace = TRUE, prob = prob)   2.49ms   2.54ms     388.

m <- 1e1
prob <- dqrunif(m)
bm <- bench::mark(sample.int(m, n, replace = TRUE, prob = prob),
                  dqsample.int(m, n, replace = TRUE, prob = prob),
                  check = FALSE)
bm[, 1:4]
#> # A tibble: 2 × 4
#>   expression                                           min   median `itr/sec`
#>   <bch:expr>                                      <bch:tm> <bch:tm>     <dbl>
#> 1 sample.int(m, n, replace = TRUE, prob = prob)      139µs    149µs     6691.
#> 2 dqsample.int(m, n, replace = TRUE, prob = prob)    526µs    564µs     1763.

^{Created on 2023-08-04 with reprex v2.0.2}

While dqsample.int is much faster for large populations, this reverts for smaller ones. However, I do not see a 10x difference. So a concrete example of your tests would still help.

Anyway, looks like I should also investigate the bisection method discussed in #18 or inspect why my original investigations at https://stubner.me/2022/12/roulette-wheel-selection-for-dqrng/ showed comparable performance for small populations.

vjontiveros commented 1 year ago

Sure, I just hope it's not something related to my own machine (running a old Intel Core i5 on Ubuntu 20.04). In my typical usage, sample.int is faster for very small populations, but it is true that the difference becomes smaller and smaller increasing the number of populations. And many thanks for your useful package, it's going to cut running time in my stochastic models for sure.

I did this:

> probs10 <- runif(10)
> probs100 <- runif(100)
> probs1000 <- runif(1000)
> probs10000 <- runif(10000)
> 
> microbenchmark::microbenchmark("base10" = sample.int(10, 1, prob = probs10),
+                                "dqrng10" = dqrng::dqsample.int(10, 1, prob = probs10),
+                                "base100" = sample.int(100, 1, prob = probs100),
+                                "dqrng100" = dqrng::dqsample.int(100, 1, prob = probs100),
+                                "base1000" = sample.int(1000, 1, prob = probs1000),
+                                "dqrng1000" = dqrng::dqsample.int(1000, 1, prob = probs1000),
+                                "base10000" = sample.int(10000, 1, prob = probs10000),
+                                "dqrng10000" = dqrng::dqsample.int(10000, 1, prob = probs10000),
+                                times = 10000)
Unit: microseconds
       expr     min        lq        mean    median        uq       max neval
     base10   3.700    4.7330    6.025951    5.1540    5.7270  4452.855 10000
    dqrng10  70.392   76.5140   84.831459   79.8555   83.4410   537.470 10000
    base100   6.245    8.9940   10.307123   10.0465   10.8930    69.717 10000
   dqrng100  74.215   81.0415   90.359994   84.7135   88.4810  4679.389 10000
   base1000  71.343   74.6655   79.732079   75.7230   76.8580  6092.578 10000
  dqrng1000 140.142  146.7970  158.571762  150.6020  154.5085  6375.078 10000
  base10000 917.585  933.5860  967.388178  937.4065  970.5880  9099.942 10000
 dqrng10000 987.884 1007.3070 1054.972957 1013.1880 1047.8260 66884.217 10000

rstub commented 1 year ago

Thanks! I have taken the liberty to change the formatting of your code by surrounding it with three backticks.

This way it is better readable. Using your code (and using the reprex package for formatting) I get on a i7 CPU:

probs10 <- runif(10)
probs100 <- runif(100)
probs1000 <- runif(1000)
probs10000 <- runif(10000)

microbenchmark::microbenchmark("base10" = sample.int(10, 1, prob = probs10),
                               "dqrng10" = dqrng::dqsample.int(10, 1, prob = probs10),
                               "base100" = sample.int(100, 1, prob = probs100),
                               "dqrng100" = dqrng::dqsample.int(100, 1, prob = probs100),
                               "base1000" = sample.int(1000, 1, prob = probs1000),
                               "dqrng1000" = dqrng::dqsample.int(1000, 1, prob = probs1000),
                               "base10000" = sample.int(10000, 1, prob = probs10000),
                               "dqrng10000" = dqrng::dqsample.int(10000, 1, prob = probs10000),
                               times = 10000)
#> Unit: microseconds
#>        expr     min       lq       mean   median      uq       max neval
#>      base10   2.176   2.8475   3.319717   3.1260   3.499    32.194 10000
#>     dqrng10   1.601   1.9290   6.085725   2.1330   2.464 37518.282 10000
#>     base100   3.061   4.7515   6.133829   6.0005   6.743  1603.356 10000
#>    dqrng100   1.735   2.0720   2.474106   2.2750   2.625    28.024 10000
#>    base1000  25.853  46.4780  50.104840  50.1280  53.542  1653.449 10000
#>   dqrng1000   3.436   3.9220   4.342348   4.1450   4.486    33.646 10000
#>   base10000 590.307 642.3895 660.203399 654.4490 667.192  6604.411 10000
#>  dqrng10000  20.362  22.0310  22.866362  22.5240  23.263    57.284 10000

^{Created on 2023-08-04 with reprex v2.0.2}

Here for the Median, dqsample.int is always faster than sample.int. I will try with other CPUs later.

How did you install dqrng?

vjontiveros commented 1 year ago

Wow, such a change. Probably I need to get a new computer... I installed dqrng 0.3.0 from cran, using the typical install.packages. It compiled succesfully. But now I have seen something odd. I get warnings saying:

In dqrng::dqsample.int(100, 1, prob = probs100) :
  Using 'prob' is not supported yet. Using default 'sample.int'.

which has no sense...

rstub commented 1 year ago

The timing differences for the R functions are noticeable but not that extreme. I would not get a new computer just now. The differences for dqrng are extreme, though. Maybe you used the development version together with devtools::load_all()? That compiles w/o optimization and would explain the extreme difference. Instead you can install the latest development version with devtools::install() from a working copy or via

options(repos = c(
  rstub = 'https://rstub.r-universe.dev',
  CRAN = 'https://cloud.r-project.org'))
install.packages('dqrng')

The current CRAN version 0.3.0 does not have have weighted sampling, which is why you get this warning.

vjontiveros commented 1 year ago

I installed latest version and got similar results as yours. Many thanks for the help and sorry for the troubles!

> library(dqrng)
> probs10 <- runif(10)
> probs100 <- runif(100)
> probs1000 <- runif(1000)
> probs10000 <- runif(10000)
> microbenchmark::microbenchmark("base10" = sample.int(10, 1, prob = probs10),
+                                "dqrng10" = dqrng::dqsample.int(10, 1, prob = probs10),
+                                "base100" = sample.int(100, 1, prob = probs100),
+                                "dqrng100" = dqrng::dqsample.int(100, 1, prob = probs100),
+                                "base1000" = sample.int(1000, 1, prob = probs1000),
+                                "dqrng1000" = dqrng::dqsample.int(1000, 1, prob = probs1000),
+                                "base10000" = sample.int(10000, 1, prob = probs10000),
+                                "dqrng10000" = dqrng::dqsample.int(10000, 1, prob = probs10000),
+                                times = 10000)
Unit: microseconds
       expr     min       lq       mean   median       uq      max neval
     base10   3.753   4.5990   5.651133   4.9880   5.5900  153.807 10000
    dqrng10   4.474   5.2730   7.003588   5.7930   7.0650  136.767 10000
    base100   6.456   8.6000  10.915304   9.8830  10.7410 2656.511 10000
   dqrng100   4.729   5.5805   7.348371   6.0895   7.4020  151.335 10000
   base1000  70.510  74.4630  79.038821  75.6525  79.0570 2820.972 10000
  dqrng1000   7.547   8.4450  10.334555   9.0000  10.3650  146.447 10000
  base10000 916.296 934.8985 980.620371 952.5605 995.8445 5733.484 10000
 dqrng10000  35.837  36.7360  40.042864  37.5065  39.6050  341.567 10000

rstub commented 1 year ago

You are welcome. Actually I would like to keep the issue open since I did see some strange effects. For a population of 10 drawing one 1 item dqsample.int is faster than sample.int. But for 10^4 items, it is actually slower. That is surprising and I would like to look into it.

rstub commented 1 year ago

Unfortunately these tests are only scratching the surface. The performance from probabilistic sampling gets really bad, if one (or few) possibilities have much higher weights than the others. To quantify this, one can use max_weight / average_weight, which is a measure for how many tries one needs before a draw is successfull.

This is 1 for unweighted distribution.
This is (around) 2 for the random distributions used so far.
This would be the number of elements in the extreme case where all weight is on one element

I am not sure yet what a good cut-off point is go for a different algorithm. Or if it would be better to use the alias method right away, c.f. https://www.keithschwarz.com/darts-dice-coins/.

For now I will have to back out weighted sampling and reopen #18 and #45.