Open Alexhuszagh opened 3 years ago
I've updated the PR to do the following:
parse_tokenized
to parse_from_parts
This should be a much more satisfactory PR right now. The amount of additional code is much smaller, and mostly in unavoidable places.
Hi @Alexhuszagh - thanks for taking time in getting a stab at this. I think the idea is good (and surely +1 for the pre-RFC - let's see how it will be received...).
Before we get into code structure discussions etc though, I've tried running the bench suite a few times for this branch and master, results seems to be fairly consistent. In particular for mesh.txt and canada.txt (the latter probably the most representative of what the most 'common' floats to be parsed would look like), there seems to be a ~7% performance drop which is outside of noise bounds.
There might be bias because the master branch code is heavily hand-tuned towards high performance on these benchmarks (on my machine in particular), but, on the other hand, medians outside of IQR signify that it's not just noise. Would you perhaps care to run these a few times on your machine to see how the numbers look like? If it's significant indeed, we should probably investigate what it's caused by? Because the code is logically identical, but the order of things may be slightly changed.
master branch:
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| mesh.txt 10.41 10.43 10.49 10.84 11.67 12.89 16.63 |
| canada.txt 21.77 21.83 22.57 23.41 24.41 26.21 33.01 |
| uniform 19.56 19.82 20.93 22.75 26.15 31.25 43.28 |
| one_over_rand32 29.24 29.32 30.64 31.83 33.54 37.35 56.22 |
| simple_uniform32 19.53 19.58 20.03 20.82 22.16 24.80 45.92 |
| simple_int32 15.89 15.92 16.20 17.20 18.22 19.93 31.27 |
| int_e_int 27.28 27.38 28.40 29.80 31.39 36.58 51.47 |
| simple_int64 25.68 25.73 26.61 27.62 29.14 31.91 52.64 |
| bigint_int_dot_int 40.76 40.86 41.89 43.28 45.07 48.16 58.14 |
| big_ints 83.95 85.00 87.83 90.10 92.70 98.71 117.15 |
this branch:
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| mesh.txt 11.07 11.15 11.25 11.62 12.34 13.67 22.26 |
| canada.txt 23.44 23.52 24.33 25.13 26.17 28.18 37.90 |
| uniform 19.37 19.55 20.99 22.96 25.57 31.03 47.99 |
| one_over_rand32 29.42 29.82 30.91 32.09 33.78 41.18 72.31 |
| simple_uniform32 19.37 19.52 20.03 21.06 22.46 26.23 69.84 |
| simple_int32 16.28 16.34 17.03 18.12 19.52 25.08 35.35 |
| int_e_int 27.17 27.33 28.12 29.35 31.00 36.11 64.15 |
| simple_int64 25.92 26.01 27.11 28.37 30.25 35.24 59.69 |
| bigint_int_dot_int 40.76 41.21 42.49 44.14 46.23 51.62 75.35 |
| big_ints 81.36 83.12 86.57 89.38 92.65 102.98 138.79 |
@aldanor Definitely. This might also be due to bad inlining. I'll try changing a few inlines to #[inline(always)]
, to ensure they're identical to the code they're replacing, since 7% sounds well outside of noise benchmarks.
Ok this is definitely slower and inling doesn't seem to work. Getting repeated ~7% performance hits with canada.txt
, so I'll try to triage now.
Thanks for testing out :)
Yea, I've spent lots and lots of time (the majority other that correctness) in this crate trying to speed this up by making inlining right and avoid every single bound check where possible (hence all the weird types in common
module), so reshuffling things without profiling will likely slow it down.
But I think it should be doable since it's essentially the same logic.
Thanks for testing out :)
Yea, I've spent lots and lots of time (the majority other that correctness) in this crate trying to speed this up by making inlining right and avoid every single bound check where possible (hence all the weird types in
common
module), so reshuffling things without profiling will likely slow it down.But I think it should be doable since it's essentially the same logic.
It's a false positive: it's when both functions are used in the same binary, it messes with the inlining thresholds. I've added a feature gate use_tokenized
for the benchmark, and if only parse
is used, it goes back to normal speeds. Since we can expect a single binary to only use one-or-the-other in a single compilation unit, I think this is fine.
Here are a few sample benchmarks on my end:
Note: All the benchmarks are run on x86_64 Linux, with only tmux and Sublime Text open at a time, with the performance governor set.
=====================================================================================
| canada.txt (111126, 1.93 MB, f64) |
|===================================================================================|
| |
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 27.07 27.30 27.68 28.04 28.44 29.54 40.41 |
| lexical 75.74 76.08 76.57 77.07 77.95 80.83 99.25 |
| from_str 207.82 208.48 209.31 210.52 212.41 217.33 245.38 |
| |
| Mfloat/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 24.75 33.86 35.16 35.67 36.12 36.63 36.94 |
| lexical 10.08 12.38 12.83 12.98 13.06 13.15 13.20 |
| from_str 4.08 4.60 4.71 4.75 4.78 4.80 4.81 |
| |
| MB/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 430.61 589.17 611.91 620.63 628.61 637.40 642.84 |
| lexical 175.33 215.35 223.29 225.79 227.28 228.75 229.76 |
| from_str 70.92 80.08 81.93 82.66 83.14 83.47 83.73 |
| |
=====================================================================================
Shows a clear 3% performance hit on the median, and much worse in other cases.
=====================================================================================
| canada.txt (111126, 1.93 MB, f64) |
|===================================================================================|
| |
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 28.67 28.74 28.82 28.93 29.45 30.90 36.30 |
| fast-float-tokenized 43.63 43.87 44.24 44.61 45.19 46.91 53.80 |
| lexical 73.79 74.00 74.13 74.54 75.82 78.61 105.16 |
| from_str 211.56 212.06 212.76 213.85 216.63 221.97 247.44 |
| |
| Mfloat/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 27.55 32.40 33.96 34.57 34.70 34.80 34.87 |
| fast-float-tokenized 18.59 21.32 22.13 22.42 22.60 22.79 22.92 |
| lexical 9.51 12.72 13.19 13.42 13.49 13.51 13.55 |
| from_str 4.04 4.51 4.62 4.68 4.70 4.72 4.73 |
| |
| MB/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 479.34 563.81 590.90 601.60 603.83 605.54 606.86 |
| fast-float-tokenized 323.45 370.94 385.14 390.12 393.34 396.62 398.88 |
| lexical 165.48 221.37 229.51 233.46 234.74 235.16 235.81 |
| from_str 70.33 78.43 80.33 81.37 81.79 82.06 82.25 |
| |
=====================================================================================
use_tokenized
Shows the 8% performance hit is undone (this is consistent).
=====================================================================================
| canada.txt (111126, 1.93 MB, f64) |
|===================================================================================|
| |
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 26.36 26.39 26.46 26.64 27.39 28.94 52.07 |
| lexical 73.45 73.51 73.62 74.01 75.21 77.79 90.84 |
| from_str 210.11 210.89 212.18 214.25 219.13 232.37 263.52 |
| |
| Mfloat/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 19.21 34.57 36.51 37.54 37.80 37.89 37.94 |
| lexical 11.01 12.86 13.30 13.51 13.58 13.60 13.61 |
| from_str 3.79 4.31 4.56 4.67 4.71 4.74 4.76 |
| |
| MB/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 334.20 601.57 635.33 653.25 657.77 659.32 660.18 |
| lexical 191.55 223.73 231.39 235.11 236.38 236.72 236.92 |
| from_str 66.04 75.00 79.42 81.22 82.01 82.52 82.82 |
| |
=====================================================================================
use_tokenized
Shows the performance hit is incurred again.
=====================================================================================
| canada.txt (111126, 1.93 MB, f64) |
|===================================================================================|
| |
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 27.65 27.82 27.96 28.10 28.52 29.78 34.61 |
| fast-float-tokenized 42.87 43.32 43.79 44.28 45.01 46.22 60.92 |
| lexical 73.77 73.95 74.11 74.53 75.59 77.92 87.04 |
| from_str 201.66 203.34 204.56 205.75 207.54 213.01 242.14 |
| |
| Mfloat/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 28.90 33.59 35.06 35.59 35.76 35.94 36.16 |
| fast-float-tokenized 16.41 21.64 22.22 22.58 22.84 23.09 23.33 |
| lexical 11.49 12.83 13.23 13.42 13.49 13.52 13.56 |
| from_str 4.13 4.70 4.82 4.86 4.89 4.92 4.96 |
| |
| MB/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 502.82 584.54 610.15 619.31 622.28 625.45 629.28 |
| fast-float-tokenized 285.64 376.57 386.61 393.00 397.36 401.72 405.94 |
| lexical 199.92 223.34 230.20 233.49 234.80 235.32 235.88 |
| from_str 71.87 81.71 83.85 84.58 85.07 85.58 86.29 |
| |
=====================================================================================
It might be possible to use #[inline(always)]
for everything, and undo this performance hit, but that seems like a bad strategy and lead to larger binary sizes (which would be suboptimal for architectures where binary size, and not speed is the primary concern, such as embedded systems).
I got similar results on x86_64 Windows MSVC (slightly more muted, but similar). I'm almost certain this is due to compiler inlining and weird stuff, and not actually going to have any impact on real-world cases.
=====================================================================================
| canada.txt (111126, 1.93 MB, f64) |
|===================================================================================|
| |
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 23.61 24.69 26.00 27.35 29.46 33.91 49.25 |
| lexical 61.32 62.75 64.58 67.15 70.93 82.38 116.19 |
| from_str 174.15 180.41 185.68 191.25 199.58 231.57 309.24 |
| |
| Mfloat/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 20.30 29.51 33.95 36.57 38.47 40.52 42.35 |
| lexical 8.61 12.15 14.11 14.90 15.49 15.94 16.31 |
| from_str 3.23 4.32 5.01 5.23 5.39 5.54 5.74 |
| |
| MB/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 353.30 513.50 590.78 636.44 669.42 705.08 737.00 |
| lexical 149.76 211.38 245.48 259.20 269.48 277.35 283.80 |
| from_str 56.27 75.16 87.20 90.99 93.73 96.47 99.92 |
| |
=====================================================================================
use_tokenized
=====================================================================================
| canada.txt (111126, 1.93 MB, f64) |
|===================================================================================|
| |
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 24.10 24.70 25.84 27.00 29.19 35.28 66.45 |
| lexical 60.45 61.91 64.47 68.22 73.44 83.75 127.01 |
| from_str 166.05 172.24 177.34 182.48 191.74 226.01 286.65 |
| |
| Mfloat/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 15.05 28.76 34.27 37.04 38.70 40.49 41.49 |
| lexical 7.87 11.96 13.62 14.66 15.51 16.15 16.54 |
| from_str 3.49 4.43 5.22 5.48 5.64 5.81 6.02 |
| |
| MB/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 261.88 500.53 596.37 644.52 673.50 704.56 721.92 |
| lexical 137.01 208.07 237.06 255.09 269.93 281.10 287.85 |
| from_str 60.71 77.16 90.76 95.36 98.13 101.06 104.79 |
| |
=====================================================================================
use_tokenized
=====================================================================================
| canada.txt (111126, 1.93 MB, f64) |
|===================================================================================|
| |
| ns/float min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 24.37 25.18 26.43 27.58 29.54 34.87 54.98 |
| fast-float-tokenized 33.08 34.93 36.32 37.98 41.33 53.11 71.55 |
| lexical 65.74 67.02 69.50 72.87 78.00 91.32 128.91 |
| from_str 169.05 175.63 182.66 189.98 201.41 228.94 373.07 |
| |
| Mfloat/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 18.19 28.70 33.88 36.27 37.84 39.73 41.03 |
| fast-float-tokenized 13.98 18.83 24.20 26.33 27.54 28.65 30.23 |
| lexical 7.76 10.96 12.82 13.73 14.39 14.93 15.21 |
| from_str 2.68 4.38 4.97 5.26 5.48 5.70 5.92 |
| |
| MB/s min 5% 25% median 75% 95% max |
|-----------------------------------------------------------------------------------|
| fast-float 316.52 499.43 589.63 631.08 658.48 691.41 713.93 |
| fast-float-tokenized 243.19 327.64 421.10 458.16 479.20 498.52 526.05 |
| lexical 134.99 190.78 223.10 238.90 250.39 259.79 264.72 |
| from_str 46.64 76.17 86.46 91.60 95.29 99.11 102.93 |
| |
=====================================================================================
Specs are similar to before (different CPU, slightly higher clock speed):
Adds support for #17.
This probably needs significant edits prior to merging, but the benchmarks with the primitive tokenizing (it doesn't use an optimized memchr) still are pretty good (it also has to duplicate a lot of work when parsing, so this isn't really that surprising):
I've added tests to ensure the differences in parsing don't lead to correctness issues, as seen here.
The following methods have been added:
In order to share more code,
parse_long_mantissa
has been changed to accept aDecimal
by-value, and the two internal functions have been added:Any feedback would be great, I'm probably going to need to refactor a bit to increase code re-use. The API can therefore be used like this:
If an invalid character is found, it merely stops parsing early: no warning is given for invalid input, but it will not undergo any unsafe behavior. It assumes the input is valid, and the documentation clearly reflects this: