fitzgen / bumpalo

A fast bump allocation arena for Rust
https://docs.rs/bumpalo
Apache License 2.0
1.34k stars 109 forks source link

Improve performance of `String::push_str` #229

Closed overlookmotel closed 4 months ago

overlookmotel commented 4 months ago

I noticed when using collections::String::push_str with large slices that the performance was very poor compared to std::string::String.

I believe the reason is that std's Vec has a specialised implementation of extend_from_slice for Vec<u8>, but bumpalo's Vec does not (and probably can't without using nightly features).

Consequently, push_str(str) where str is 16 KB is currently equivalent to 16000 x individual calls to push() for each byte.

This PR fixes that by using ptr::copy_nonoverlapping to copy the entire slice in one go.

For a 16 KiB string, it's a 80x speed-up:

time:  [403.92 ns 429.47 ns 453.96 ns]                           
       thrpt:  [36.091 Gelem/s 38.149 Gelem/s 40.563 Gelem/s]
change:
       time:   [-98.836% -98.766% -98.700%] (p = 0.00 < 0.05)
       thrpt:  [+7592.7% +8002.3% +8491.5%]
       Performance has improved.

Also added a test for push_str with a long string, and a benchmark.

I've written the code in quite a verbose style with lengthy "SAFETY" comments. Maybe this is overkill, but you don't know me, so I wanted to make it clear from the code that the change is valid.

overlookmotel commented 4 months ago

This change should also produce similar speed-up on String::from_str_in because it uses push_str internally.

It may also be worth investigating whether from_str_in can get a further speed-up if it doesn't use push_str at all, because the length + offset calculations and call to Vec::reserve in push_str are all extraneous when the String has been freshly created with sufficient capacity. If the compiler inlines push_str, it would hopefully figure that out itself, but I'm not sure if it will. If it doesn't:

pub fn from_str_in(s: &str, bump: &'bump Bump) -> String<'bump> {
  let len = s.len();
  let mut t = String::with_capacity_in(len, bump);
  unsafe {
    ptr::copy_nonoverlapping(s.as_ptr(), t.vec.as_mut_ptr(), len);
    t.vec.set_len(len);
  }
  t
}

Would hugely appreciate it if someone had time to review this PR. It appears to be a significant speed boost with no downside, as far as I can see.

OXC compiler uses bumpalo (to great effect, bumpalo is brilliant) and I imagine this small optimization to bumpalo would have a significant impact on OXC's benchmarks.

fitzgen commented 4 months ago

It looks like there are some test failures in CI: https://github.com/fitzgen/bumpalo/actions/runs/7848854221/job/21528762407?pr=229#step:6:110

overlookmotel commented 4 months ago

My apologies! That's really weird, I was pretty sure I'd covered all the bases. I'll look into it.

And thanks very much for coming back. Sorry I hassled you.

overlookmotel commented 4 months ago

Is it possible to re-run the "Rust / build (stable, --features collections,boxed)" CI job? Am wondering if it could be related to something in nightly, or genuinely 100% my stupidity.

fitzgen commented 4 months ago

Retriggered.

Also possible that this test is failing on main right now, but I don't have time to investigate at the moment.

overlookmotel commented 4 months ago

I've reproduced it locally. Yes, it is failing on main too.

Have opened #230 with the details.

The failure does appear to be completely unrelated to this PR, so would you consider merging this in the meantime?

overlookmotel commented 4 months ago

I've also added a bit to the test to include pushing an empty string.

overlookmotel commented 4 months ago

CI should pass if rebased on top of #232.

fitzgen commented 4 months ago

This should be ready for a rebase.

overlookmotel commented 4 months ago

Thanks. Now rebased on main.

overlookmotel commented 4 months ago

@fitzgen I know I've failed on the follow-on for #232, but could I possibly make a request if you could do a release on crates.io containing the changes in this and #231?

It's a fairly significant speed-boost for commonly used functions, and would be a real help to be able to integrate these changes into OXC.

fitzgen commented 4 months ago

Yeah I can do a release shortly. Thanks for the PRs!

overlookmotel commented 4 months ago

Thanks @fitzgen. That'd be very much appreciated. I hope I can contribute more to bumpalo in future, just right now is not ideal timing for me.

fitzgen commented 4 months ago

FYI 3.15.0 is now published on crates.io

overlookmotel commented 4 months ago

FYI 3.15.0 is now published on crates.io

Amazing! Thank you.