Closed valarauca closed 10 months ago
Could you run cargo fmt
and fix the new clippy errors?
Squashing locally broke because of the upstream commits, idk how to fix that.
Clippy demanded I do a lot of things outside of the scope of the original change
Hmmm, I'm actually seeing a perf regression with this PR
$ cargo bench --bench="*" -- --baseline=main-2024-01-24
direct (c wrapper)/default
time: [23.918 µs 23.956 µs 24.004 µs]
change: [+0.2632% +0.4314% +0.5961%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
direct (rust impl)/default
time: [30.104 µs 30.139 µs 30.180 µs]
change: [+0.2469% +0.4175% +0.5814%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
inverse (c wrapper)/default
time: [44.610 µs 44.664 µs 44.723 µs]
change: [-2.4267% -0.9378% +0.0711%] (p = 0.16 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
5 (5.00%) high mild
1 (1.00%) high severe
inverse (rust impl)/default
time: [78.354 µs 78.452 µs 78.564 µs]
change: [+5.8279% +6.0227% +6.2028%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low mild
5 (5.00%) high mild
3 (3.00%) high severe
If I revert the clippy/format commit (877afcc) I see both an improvement and a regression:
$ cargo bench --bench="*" -- --baseline=main-2024-01-24
direct (c wrapper)/default
time: [23.948 µs 24.010 µs 24.088 µs]
change: [+0.1347% +0.3540% +0.5856%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
direct (rust impl)/default
time: [29.044 µs 29.085 µs 29.126 µs]
change: [-3.2355% -3.0412% -2.8588%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
inverse (c wrapper)/default
time: [44.673 µs 44.745 µs 44.820 µs]
change: [-2.3927% -0.9049% +0.1063%] (p = 0.19 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
inverse (rust impl)/default
time: [77.100 µs 77.238 µs 77.404 µs]
change: [+4.1364% +4.3271% +4.5071%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
6 (6.00%) high mild
Any idea why?
If I revert the clippy/format commit (https://github.com/georust/geographiclib-rs/commit/877afcc73acb73f0bdc1929787beb95750850789) I see both an improvement and a regression: Any idea why?
My best guess is the change is within the noise threshold. I commonly see +/-3µs changes without a code change locally (even with n=1000
).
I've taken steps to try to reduce this noise (closing applications, changing cpu throttling/power options) but it remains frustratingly common result.
Noise is a real thing, but the consistency with which I can reproduce the relative behavior leads me to believe this is not only noise in the benchmark.
d8bbc65 - (HEAD) Standardized to usize (20 hours ago)
direct: -3% improvement vs main
inverse: +4% regression vs main
877afcc - (HEAD) Make cargo fmt & cargo clippy happy (16 hours ago)
no (or very small) change in direct vs main
+5% regression in inverse vs main
Are you able to reproduce this?
The regressions from the clippy/fmt commit seem to be from changing the loop iteration.
e.g. this branch https://github.com/georust/geographiclib-rs/tree/mkirk/usize (as of https://github.com/georust/geographiclib-rs/commit/20a2b333570905a54296f23928644722fb29a93f) has the same perf characteristics as before your clippy changes.
That is:
Thanks! I included this in #58 (but reverted part of the clippy changes).
Are you able to reproduce this?
Sort of...
Looking at the ASM output link
The main different I see is the normal index based loop does a pretty simple structure
BELOW_18 -> POLY_SUM (inner loop) -> BELOW_18 OR RET
While the clippy version does
IDK -> BELOW_18 -> POLY_SUM (inner loop) -> IDK | RETURN_OR_PANIC
This makes the clippy version extremely branch heavy, as it seems to frequently branch off check a bunch of stuff, then return to the inner loop.
Interesting! That might account for it.
Remove instances of i64 & isize when indexing, standardized on usize
CHANGES.md
if knowledge of this change could be valuable to users.Per https://github.com/georust/geographiclib-rs/pull/56#issuecomment-1908740620
Attempting to reducing the scope of the PR.
This only addresses moving indexing variables to
usize
.