deus-x-mackina / poker

A Rust crate for evaluating poker hands
MIT License
12 stars 5 forks source link

Static evaluation is slower than dynamic evaluation? #10

Open vtjeng opened 1 year ago

vtjeng commented 1 year ago

I ran the benchmarking code and found the following surprising result:

$ cargo bench --features static_lookup
[...]
   Compiling criterion v0.5.1
    Finished bench [optimized] target(s) in 40.91s
     Running benches/evaluation.rs (/home/vtjeng/development/vtjeng/poker/target/release/deps/evaluation-eb683cdeab2cd6cc)
Gnuplot not found, using plotters backend
Evaluator::new()        time:   [119.11 µs 120.38 µs 122.05 µs]
                        change: [+0.2207% +1.1306% +2.1394%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

eval/dynamic            time:   [37.473 ms 37.534 ms 37.623 ms]
                        change: [-7.7382% -5.9239% -4.2632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high severe
eval/static             time:   [68.655 ms 69.415 ms 70.157 ms]
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild

This stays the case when I increase the number of samples to 1000:

$ cargo bench --features static_lookup
   Compiling poker v0.5.1 (/home/vtjeng/development/vtjeng/poker)
    Finished bench [optimized] target(s) in 21.23s
     Running benches/evaluation.rs (/home/vtjeng/development/vtjeng/poker/target/release/deps/evaluation-eb683cdeab2cd6cc)
Gnuplot not found, using plotters backend
Evaluator::new()        time:   [116.23 µs 116.55 µs 116.94 µs]
                        change: [-3.4890% -2.6028% -1.8244%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

Benchmarking eval/dynamic: Warming up for 3.0000 s
Warning: Unable to complete 1000 samples in 5.0s. You may wish to increase target time to 37.7s, or reduce sample count to 130.
eval/dynamic            time:   [37.835 ms 37.906 ms 37.985 ms]
                        change: [+0.6966% +1.0511% +1.3829%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 39 outliers among 1000 measurements (3.90%)
  17 (1.70%) high mild
  22 (2.20%) high severe
Benchmarking eval/static: Warming up for 3.0000 s
Warning: Unable to complete 1000 samples in 5.0s. You may wish to increase target time to 69.1s, or reduce sample count to 70.
eval/static             time:   [69.046 ms 69.114 ms 69.190 ms]
                        change: [-0.8115% -0.0999% +0.4961%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 28 outliers among 1000 measurements (2.80%)
  23 (2.30%) high mild
  5 (0.50%) high severe

Am I misinterpreting the benchmark results or have you made improvements to the dynamic evaluation such that it's now better than the static evaluation?

deus-x-mackina commented 1 year ago

Thank you for raising this issue!

It's indeed unexpected that the static evaluation appears to be slower than the dynamic lookup. I recall that when we first implemented static evaluation, the performance was as you have noted, although I did not scrutinize it at the time. This may be due to the FxHashMap hashing algorithm, which could potentially outperform PHF's hashing algorithm in terms of efficiency, despite both having O(1) complexity.

It's important to note, however, that the primary advantage of the static lookup was initially conceived as an allocation-free option, rather than a faster one.

Your observation touches on the very sensible perception that stack-allocated objects, i.e., [T; N], typically run faster than heap-allocated ones, like vec![T; N].

Moving forward, I'll investigate if the performance discrepancy is due to the hashing algorithm and explore potential ways to enhance PHF's speed. If we are unable to identify a resolution, we may need to consider documenting this unexpected behavior for clarity and transparency, as users could naturally anticipate the static variant to be quicker.

vtjeng commented 1 year ago

Thanks for the response! I filed this issue partially because I was trying to figure out if I was running your benchmark wrong: https://github.com/elliottneilclark/rs-poker is ~1000x faster by their benchmarks, with seven card hand evaluation will rank a hand in < 25 ns. Is a per-hand evaluation time on the order of 10ms correct for this package? (I can file a separate issue if you'd like to keep the discussions separate).

deus-x-mackina commented 1 year ago

The performance of approximately 10ms is the time required to evaluate every single combination of 5 cards that can be drawn from a 52-card deck. From the benchmarking code:

let gen = Card::generate_deck().combinations(5).collect::<Box<_>>();

let routine = {
    /* ... */
    move || {
        for cards in gen.iter() {
            let _ = eval.evaluate(cards);
        }
    }
};

group.bench_function("dynamic", |b| b.iter(&routine));

Thus, the benchmark measures the time it takes to evaluate "52 choose 5", which equals 2,598,960 possible 5-card hands.

Therefore, the time to evaluate a single 5-card hand is approximately 37 divided by 2.5 million milliseconds, which translates to around 15 ns.

To add more depth to our benchmarks, we could consider the following enhancements:

As you've suggested, this improvement merits a separate issue. I'll go ahead and open it up!