georust / geographiclib-rs

A port of geographiclib in Rust.
MIT License
41 stars 9 forks source link

Standardized to usize #57

Closed valarauca closed 10 months ago

valarauca commented 10 months ago

Remove instances of i64 & isize when indexing, standardized on usize


Per https://github.com/georust/geographiclib-rs/pull/56#issuecomment-1908740620

Attempting to reducing the scope of the PR.

This only addresses moving indexing variables to usize.

michaelkirk commented 10 months ago

Could you run cargo fmt and fix the new clippy errors?

valarauca commented 10 months ago

Squashing locally broke because of the upstream commits, idk how to fix that.

Clippy demanded I do a lot of things outside of the scope of the original change

michaelkirk commented 10 months ago

Hmmm, I'm actually seeing a perf regression with this PR

$ cargo bench --bench="*" --  --baseline=main-2024-01-24
direct (c wrapper)/default
                        time:   [23.918 µs 23.956 µs 24.004 µs]
                        change: [+0.2632% +0.4314% +0.5961%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

direct (rust impl)/default
                        time:   [30.104 µs 30.139 µs 30.180 µs]
                        change: [+0.2469% +0.4175% +0.5814%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

inverse (c wrapper)/default
                        time:   [44.610 µs 44.664 µs 44.723 µs]
                        change: [-2.4267% -0.9378% +0.0711%] (p = 0.16 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  1 (1.00%) high severe

inverse (rust impl)/default
                        time:   [78.354 µs 78.452 µs 78.564 µs]
                        change: [+5.8279% +6.0227% +6.2028%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

If I revert the clippy/format commit (877afcc) I see both an improvement and a regression:

$ cargo bench --bench="*" --  --baseline=main-2024-01-24

direct (c wrapper)/default
                        time:   [23.948 µs 24.010 µs 24.088 µs]
                        change: [+0.1347% +0.3540% +0.5856%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

direct (rust impl)/default
                        time:   [29.044 µs 29.085 µs 29.126 µs]
                        change: [-3.2355% -3.0412% -2.8588%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

inverse (c wrapper)/default
                        time:   [44.673 µs 44.745 µs 44.820 µs]
                        change: [-2.3927% -0.9049% +0.1063%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

inverse (rust impl)/default
                        time:   [77.100 µs 77.238 µs 77.404 µs]
                        change: [+4.1364% +4.3271% +4.5071%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  6 (6.00%) high mild

Any idea why?

valarauca commented 10 months ago

If I revert the clippy/format commit (https://github.com/georust/geographiclib-rs/commit/877afcc73acb73f0bdc1929787beb95750850789) I see both an improvement and a regression: Any idea why?

My best guess is the change is within the noise threshold. I commonly see +/-3µs changes without a code change locally (even with n=1000).

I've taken steps to try to reduce this noise (closing applications, changing cpu throttling/power options) but it remains frustratingly common result.

michaelkirk commented 10 months ago

Noise is a real thing, but the consistency with which I can reproduce the relative behavior leads me to believe this is not only noise in the benchmark.

Are you able to reproduce this?

michaelkirk commented 10 months ago

The regressions from the clippy/fmt commit seem to be from changing the loop iteration.

e.g. this branch https://github.com/georust/geographiclib-rs/tree/mkirk/usize (as of https://github.com/georust/geographiclib-rs/commit/20a2b333570905a54296f23928644722fb29a93f) has the same perf characteristics as before your clippy changes.

That is:

michaelkirk commented 10 months ago

Thanks! I included this in #58 (but reverted part of the clippy changes).

valarauca commented 10 months ago

Are you able to reproduce this?

Sort of...

Looking at the ASM output link

The main different I see is the normal index based loop does a pretty simple structure

BELOW_18 -> POLY_SUM (inner loop) -> BELOW_18 OR  RET

While the clippy version does

 IDK -> BELOW_18 -> POLY_SUM (inner loop) -> IDK | RETURN_OR_PANIC

This makes the clippy version extremely branch heavy, as it seems to frequently branch off check a bunch of stuff, then return to the inner loop.

michaelkirk commented 10 months ago

Interesting! That might account for it.