Closed valarauca closed 9 months ago
Could you run cargo fmt
and fix the new clippy errors?
Squashing locally broke because of the upstream commits, idk how to fix that.
Clippy demanded I do a lot of things outside of the scope of the original change
Hmmm, I'm actually seeing a perf regression with this PR
$ cargo bench --bench="*" -- --baseline=main-2024-01-24
direct (c wrapper)/default
time: [23.918 µs 23.956 µs 24.004 µs]
change: [+0.2632% +0.4314% +0.5961%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
direct (rust impl)/default
time: [30.104 µs 30.139 µs 30.180 µs]
change: [+0.2469% +0.4175% +0.5814%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
inverse (c wrapper)/default
time: [44.610 µs 44.664 µs 44.723 µs]
change: [-2.4267% -0.9378% +0.0711%] (p = 0.16 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
5 (5.00%) high mild
1 (1.00%) high severe
inverse (rust impl)/default
time: [78.354 µs 78.452 µs 78.564 µs]
change: [+5.8279% +6.0227% +6.2028%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low mild
5 (5.00%) high mild
3 (3.00%) high severe
If I revert the clippy/format commit (877afcc) I see both an improvement and a regression:
$ cargo bench --bench="*" -- --baseline=main-2024-01-24
direct (c wrapper)/default
time: [23.948 µs 24.010 µs 24.088 µs]
change: [+0.1347% +0.3540% +0.5856%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
direct (rust impl)/default
time: [29.044 µs 29.085 µs 29.126 µs]
change: [-3.2355% -3.0412% -2.8588%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
inverse (c wrapper)/default
time: [44.673 µs 44.745 µs 44.820 µs]
change: [-2.3927% -0.9049% +0.1063%] (p = 0.19 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
inverse (rust impl)/default
time: [77.100 µs 77.238 µs 77.404 µs]
change: [+4.1364% +4.3271% +4.5071%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
6 (6.00%) high mild
Any idea why?
If I revert the clippy/format commit (https://github.com/georust/geographiclib-rs/commit/877afcc73acb73f0bdc1929787beb95750850789) I see both an improvement and a regression: Any idea why?
My best guess is the change is within the noise threshold. I commonly see +/-3µs changes without a code change locally (even with n=1000
).
I've taken steps to try to reduce this noise (closing applications, changing cpu throttling/power options) but it remains frustratingly common result.
Noise is a real thing, but the consistency with which I can reproduce the relative behavior leads me to believe this is not only noise in the benchmark.
d8bbc65 - (HEAD) Standardized to usize (20 hours ago)
direct: -3% improvement vs main
inverse: +4% regression vs main
877afcc - (HEAD) Make cargo fmt & cargo clippy happy (16 hours ago)
no (or very small) change in direct vs main
+5% regression in inverse vs main
Are you able to reproduce this?
The regressions from the clippy/fmt commit seem to be from changing the loop iteration.
e.g. this branch https://github.com/georust/geographiclib-rs/tree/mkirk/usize (as of https://github.com/georust/geographiclib-rs/commit/20a2b333570905a54296f23928644722fb29a93f) has the same perf characteristics as before your clippy changes.
That is:
Thanks! I included this in #58 (but reverted part of the clippy changes).
Are you able to reproduce this?
Sort of...
Looking at the ASM output link
The main different I see is the normal index based loop does a pretty simple structure
BELOW_18 -> POLY_SUM (inner loop) -> BELOW_18 OR RET
While the clippy version does
IDK -> BELOW_18 -> POLY_SUM (inner loop) -> IDK | RETURN_OR_PANIC
This makes the clippy version extremely branch heavy, as it seems to frequently branch off check a bunch of stuff, then return to the inner loop.
Interesting! That might account for it.
Remove instances of i64 & isize when indexing, standardized on usize
CHANGES.md
if knowledge of this change could be valuable to users.Per https://github.com/georust/geographiclib-rs/pull/56#issuecomment-1908740620
Attempting to reducing the scope of the PR.
This only addresses moving indexing variables to
usize
.