egallesio / STklos

STklos Scheme
http://stklos.net
GNU General Public License v2.0
69 stars 17 forks source link

Speedup integer division #527

Closed jpellegrini closed 1 year ago

jpellegrini commented 1 year ago

Both the GMP and the mini-GMP have functions to extract only the quotient and only the remainder from a division. We change integer_division so that if either of quotient or remainder is NULL, then it is not used, and a faster GMP function is called for bignums.

We also do the same for fixnums and doubles, for consistency and simplicity (so as not to treat bignums in a separate function), although there doesn't seem to be much of a speedup for fixnums.

TIMINGS

Ten runs of each, the time reported is the average.

Bignums

(let ((big (+ (expt 2 70) 3))
      (big2 (+ (expt 3 62) 99877))
      (a 1))
  (time
    (repeat 10_000_000
      (set! a (quotient big2 big))))
  a)

current code: 3984.07 ms
this patch:   1781.56 ms
(let ((big (+ (expt 2 70) 3))
      (big2 (+ (expt 3 62) 99877))
      (a 1))
  (time
    (repeat 10_000_000
      (set! a (remainder big2 big))))
  a)

current code: 3958.14 ms
this patch:   3360.73 ms
(let ((big (+ (expt 2 70) 3))
      (big2 (+ (expt 3 62) 99877))
      (a 1))
  (time
    (repeat 10_000_000
      (set! a (modulo big2 big))))
  a)

current code: 4026.56 ms
this patch:   3515.91 ms

Flonums

(let ((a 1))
  (time
    (repeat 20_000_000
      (set! a (quotient 10928.0 987.0))))
  a)

current code: 4010.85 ms
this patch:   2887.87 ms
(let ((a 1))
  (time
    (repeat 20_000_000
      (set! a (remainder 10928.0 987.0))))
  a)

current code: 3940.73 ms
this patch:   2890.27 ms
(let ((a 1))
  (time
    (repeat 20_000_000
      (set! a (modulo 10928.0 987.0))))
  a)

current code: 4073.73 ms
this patch:   2962.37 ms

Fixnums

(let ((a 1))
  (time
    (repeat 20_000_000
      (set! a (quotient 10928 987))))
  a)

current code: 1418 ms
this patch:   1420 ms
(let ((a 1))
  (time
    (repeat 20_000_000
      (set! a (remainder 10928 987))))
  a)

current code: 1455.19 ms
this patch:   1450.22 ms
(let ((a 1))
  (time
    (repeat 20_000_000
      (set! a (modulo 10928 987))))
  a)

current code: 1513.41 ms
this patch:   1526.12 ms

Odd? and even?for flonums

Interestingly, this one get really faster:

(let ((a 1)
      (k (+ 1.0 (expt 3 63))))
  (time
    (repeat 10_000_000
      (set! a (odd? k))))
  a)

current code: 12033.47 ms
this patch:    6485.03 ms
jpellegrini commented 1 year ago

I think this can be enhanced:

I'll change this PR soon

jpellegrini commented 1 year ago

Ok - I have made a better version of it. It's clearer.

I have also simplified int_quotient so it uses one less mpz_t variable, and calls mpz_clear on exit. This one is ready.

By the way - the timings in the commit message have been updated - the new version seems better than the first one I had submitted in the PR.

(I thought I'd use the new int_divide to replace int_quotient for making new rationals, but it wasn't worth it)

jpellegrini commented 1 year ago

And the really nice speedup happens when we divide a fixnum by a bignum:

(let ((big (+ (expt 2 70) 3))
      (fix 10010192)
      (a 1))
  (time
    (repeat 10_000_000
      (set! a (remainder fix big))))
  a)

current code: 3258.373 ms
this patch:    762.4 ms
(let ((big (+ (expt 2 70) 3))
      (fix 10010192)
      (a 1))
  (time
    (repeat 10_000_000
      (set! a (quotient fix big))))
  a)
current code: 3306.894 ms
this patch:    776.137 ms
jpellegrini commented 1 year ago

Ok, so I included some more commits related to integer division, including a considerable speedup to round (the idea is to do integer division, then just check if abs(remainder) / 2 is past the midpoint of all possible remainders.

All tests pass. I'm not sure there is more to include in this PR -- I believe it's ready.

egallesio commented 1 year ago

Hello @jpellegrini,

I have merged this PR. The enhancement is really impressive :+1: . For fixnum operations, you had similar performance with and without the patch. On my laptop, (an old i5), these operations were 10% slower with the patch. I have finally added a fast path in integer-division when arguments are both fixnums (without any conversion) and the code is now faster than before also in this case (which is probably the more frequent). Thanks a lot for this contribution.