Closed countvajhula closed 1 year ago
Notes from today's meeting: A Loophole in Qi Space. Esp @benknoble you may be interested in this "loophole" that Michael came up with as an alternative to some of the options we've talked about before ("restorative/peephole optimizations" vs inflating the core language).
I attempted to do some benchmarking of none?
's implementation, and found that (a) for arities of less than 100k, the timing of any implementation I used was sub-millisecond; (b) in most cases, using for/and
over not
was faster than composing not
with the for/or
version of any?
; and (c) the overhead of threads was too great to allow "racing" both versions to get the best of both worlds.
I tried with different distributions of the value #f
and random numbers in [0,1)
(called the "frequency") and with different orders of thread creation. I was actually quite surprised that for/or
didn't win out significantly for lower frequencies of #f
and for/and
in higher frequencies. The for/or
does clearly do far better than for/and
at frequency 0, but only "dramatically" so for n = 100k. At higher orders of magnitude, they appear maybe roughly the same?
The bars show the average of 5 runs, with error bars the size of the standard deviation.
Note that the y-axis is on a logarithmic (base-10) transform; otherwise, you wouldn't be able to see all sizes on the same graph.
👁️ 🍭 That's great data! We can aim to discuss it at this week's Qi meetup (happening on Thursday just FYI). Those charts seriously look amazing, I'm curious how you generated those. It could be useful to add the recipe to the Developer's Guide.
Hand-typed from the original; forgive typos 😅
#lang racket
(require plot file/convertible math/statistics pict)
(define (write-pict-to-svg p f)
(with-output-to-file f
(thunk
(write-bytes (convert p 'svg-bytes)))))
(define (not-o-any . args)
(not (for/or ([arg (in-list args)])
arg)))
(define (for/and-over-not . args)
(for/and ([arg (in-list args)])
(not arg)))
(define (race-threads1 . args)
(define me (current-thread))
(define t1 (thread (thunk (thread-send me (apply not-o-any args)))))
(define t2 (thread (thunk (thread-send me (apply for/and-over-not args)))))
(begin0 (thread-receive)
(kill-thread t1)
(kill-thread t2)))
(define (race-threads2 . args)
(define me (current-thread))
(define t2 (thread (thunk (thread-send me (apply for/and-over-not args)))))
(define t1 (thread (thunk (thread-send me (apply not-o-any args)))))
(begin0 (thread-receive)
(kill-thread t1)
(kill-thread t2)))
(define (generate n freq)
(build-list n (λ (_i)
(define n (random))
(and (>= n freq) n))))
(define ns (list #e1e5 #e1e6 #e1e7))
(define freqs (list 0.0 0.3 0.7 1.0))
(define fs (list not-o-any for/and-over-not race-threads1 race-threads2))
(struct experiment [f n freq stats] #:transparent)
(struct stats [min mean max stddev] #:transparent)
(define experiments
(map (match-lambda [(list n freq f) (experiment f n freq #f)])
(cartesian-product ns freqs fs)))
(define n-trials 5)
(define/match (run-experiment _e)
[((experiment f n freq #f))
(define times
(for/list ([_i n-trials])
(define-values (_res _cpu real _garbage) (time-apply f (generate n freq)))
real))
(experiment f n freq (make-stats times))])
(define (make-stats times)
(define μ (mean times))
(define σ (stddev/mean μ times))
(stats (apply min times) μ (apply max times) σ))
(define (experiments->renderers es skip x-min)
(define (x i) (+ 0.5 x-min (* i skip)))
(define label (object-name (experiment-f (first es))))
(define es-by-n (sort es < #:key experiment-n))
(define histogram
(discrete-histogram
(for/list ([e (in-list es-by-n)])
(match-define (experiment _ n _ (stats _ mean _ _ )) e)
(vector (format "n = ~a" n) mean))
#:skip skip #:x-min x-min #:label (~a label) #:color (add1 x-min)))
(define annotations
(list
(error-bars
(for/list ([(e i) (in-indexed (in-list es-by-n))])
(match-define (stats _ mean _ stddev) (experiment-stats e))
(vector (x i) mean stddev)))
#;
(for/list ([(e i) (in-indexed (in-list es-by-n))])
(match-define (stats min _ max _) (experiment-stats e))
(list (point-label (vector (x i) min) "min")
(point-label (vector (x i) max) "max")))))
(list histogram annotations))
(module+ main
(random-seed 0)
(displayln "running experiments")
(define results (time (map run-experiment experiments)))
(displayln "plotting")
(define results-by-freq (group-by experiment-freq results))
(define results-by-freq-by-f (map (curry group-by experiment-f) results-by-freq))
(define results-by-freq-by-f-sorted-by-freq
(sort results-by-freq-by-f < #:key (compose1 experiment-freq caar)))
(define skip (length fs))
(define p
(apply vc-append
(for/list ([group (in-list results-by-freq-by-f-sorted-by-freq)])
(define freq (experiment-freq (caar group)))
(parameterize ([plot-y-transform (axis-transform-bound log-transform 0.01 +inf.0)]
[plot-y-ticks (log-ticks)])
(plot-pict
(for/list ([(exps i) (in-indexed group)])
(experiments->renderers exps skip i))
#:title (format "Frequency: ~a" freq)
#:width (* 4 (plot-width))
#:y-min 0.001
#:x-label "n"
#:y-label "t")))))
(show-pict p)
(write-pict-to-svg p "bench.svg"))
@benknoble We reviewed the data and after playing around with the expansions of the two alternatives we seemed to conclude that they should perform almost identically in practice since they both short-circuit and have very similar expansions. Lmk if you think we've misunderstood though!
Also the code works perfectly and took almost ten minutes to run on my laptop. I'll add it to the wiki soon 👍
We reviewed the data and after playing around with the expansions of the two alternatives we seemed to conclude that they should perform almost identically in practice since they both short-circuit and have very similar expansions. Lmk if you think we've misunderstood though!
No, I think that's right.
Thanks for the link. I think for partition
, the easiest way to to make it non-core would be to have a compiler optimization that recognizes nested sieve
s and converts it to partition-values
(so that partition
-> nested sieve
-> partition-values
); this also optimizes human-written nested sieve
s, if there are any.
Also the code works perfectly and took almost ten minutes to run on my laptop. I'll add it to the wiki soon 👍
Great! Yeah, it isn't fast: it would be better to generate the data once, save it, and then chart whatever you want from there. I think it was roughly similar on my machine (I recall using 2 frequencies being 2–3 minutes).
Thanks for the link. I think for partition, the easiest way to to make it non-core would be to have a compiler optimization that recognizes nested sieves and converts it to partition-values (so that partition -> nested sieve -> partition-values); this also optimizes human-written nested sieves, if there are any.
Great idea! We can try it in the first optimizations PR.
This PR is ready -- any input welcome! As it will be reviewed at the end anyway as part of merging the integration branch into main, there's no pressing need for review at this stage and we can just merge it in a few days in any case.
Benchmarking recipe added to the wiki 🙂
Summary of Changes
Restore performance of Qi forms to pre-stratification levels.
qi-sdk/profile/report.rkt
instead of having it be a separate scriptSee Return to baseline performance for more information.
Public Domain Dedication
(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)