Fuzzy benchmarks - Githubissues

kana-sama commented 7 years ago

Calls of the same functions with the same arguments could be optimized by javascript engines and such benchmarks can't reveal some bottlenecks (f(5) - fast, f(6) - slow, f(7) - fast).

So imo it will be great to randomize arguments for any big iteration (50 times), like

fuzz2 : String -> (a -> b -> c) -> Generator a -> Generator b -> Benchmark

Of course there are some problems with compare, but I have some idea about stateful benchmarks (something like fold). And it's possible to realize fuzz above it with seed as state (I am reading the code right now).

Folds are basis for map/filter/other and they can be basis for benchmarks too :)

BrianHicks commented 7 years ago

so assuming we could use the existing fuzzer library, such a benchmark could look like:

fuzz "Dict.get" Dict.get (Fuzz.intRange 0 size) (Fuzz.constant <| dictOfSize size)

Do I understand you correctly?

kana-sama commented 7 years ago

Are there any differences from generators of Random core package? They are very flexible. random-extra has lots of additional generators

let
    dict =
        Dict.fromList
            [ ( "a", 1 )
            , ( "b", 2 )
            ]

    oneOfKey = 
        dict
            |> Dict.keys
            |> Random.Extra.sample
            |> Random.map (Maybe.withDefault "")
in
    fuzz2 "Dict.get" Dict.get oneOfKey (Random.Extra.constant dict)

zwilias commented 7 years ago

In order to prevent the optimization that repeated calls with the exact same args would cause, while also trying to stay away from the overhead of using a randomizer, it might be worth looking into using http://package.elm-lang.org/packages/naddeoa/stream/2.1.0/Stream to create a stream of sequential values to base things one, rather than simple List.range 1 x. Not sure if this needs support from elm-benchmark, though.

I'm going to play around and see if I can set up some useful benchmarks using that. I'll post results here.

zwilias commented 7 years ago

Thinking about this some more; I think @kana-sama is going straight to the core of things with his fold proposal:

Benchmark.fold : 
    String 
    -> (b -> c) 
    -> (a -> (a, b)) 
    -> a 
    -> Benchmark a
Benchmark.fold name consumer generator source = [...]

I'd steer clear from naming this fuzz, as that implies minimalization of values on failure, which isn't the case here. The only special thing about this is that it keeps track of a generator/source of type a.

Example for Random.Pcg:

import Random.Pcg exposing (int, maxInt, map, Generator, Seed, initialSeed)

assocGenerator : Generator (Int, ())
assocGenerator =
    int 0 maxInt
        |> map (\key -> (key, ())

assocListOfSize : Int -> Seed -> (Seed, List (Int, ())
assocListOfSize size seed =
    List.range 1 size
        |> List.foldl
            (\(seed, result) ->
                let
                    (assoc, newSeed) = Random.step assocGenerator seed
                in
                    (newSeed, assoc :: result)
            )
            (seed, [])

myBenchmark : Benchmark Seed
myBenchmark =
    Benchmark.fold 
        "100 random keys"
        Dict.fromList
        (assocListOfSize 100)
        (initialSeed valueFromFlags)

Or an alternative with a Stream of natural numbers that wraps around:

import Stream exposing (Stream, naturalNumbers, limit, cycle, map)

seqStream : Stream (Int, ())
seqStream =
    naturalNumbers
        |> limit 9007199254740991
        |> cycle
        |> map (flip (,) ())

myBenchmark : Benchmark (Stream (Int, ()))
myBenchmark =
    Benchmark.fold
        "100 sequential keys"
        Dict.fromList
        (Stream.nextN 100)
        seqStream

wdyt?

zwilias commented 7 years ago

To give a real world example of the javascript engine optimising something and leading to impossible/weird results; which this discussion pertains to: https://ellie-app.com/FssWM2pTzka1/0

Try toggling that first benchmark to see how skewed the results become, because of chrome's JIT optimising the first operation during the sizing phase.

BrianHicks commented 7 years ago

before we talk too much about the API here, let's get some concrete use cases for this. I want to get 5-10 before committing to an API so we don't have to release too many breaking changes. I think I know but: what would you use this for, concretely? More details are better.

BrianHicks commented 6 years ago

moved to elm-explorations/benchmark#3

BrianHicks / elm-benchmark

Fuzzy benchmarks #7