Avoid using default in HashBuffers::reset

Lonami commented 4 months ago

My own library indirectly depends on miniz via use flate2::write::GzDecoder;. A user reported to me a stack overflow, which seems to come from the default implementation. I don't know if they're using Windows, but it might be similar to #21.

I don't actually know for sure if reset is the problem, but it seemed like it doesn't crash immediately, so it might be the reset as opposed to initializing the buffers the first time around.

I have kept the #[inline], but I don't know if it still makes sense (hopefully the optimizer can emit a single memset; I'm having trouble running the benchmarks on Windows).

oyvindln commented 4 months ago

The compiler ought to be able to optimize that yeah but I'll have to check. I will be away for a week so may not be able to check properly until then.

Lonami commented 4 months ago

Unfortunately "ought to optimize it" is not a good thing to rely on when it can abort the entire process (not my screenshot): stack overflow

At first I wanted to make all buffers a Vec, before I realized they were already Boxed.

Initialization seems fine, as we create the Box directly, and that's probably as safe as we can be it will be done directly on the heap.

But this may run on the stack and then be copied depending on optimization level.

Lonami commented 4 months ago

The person I mentioned sent me an update, with a screenshot too:

The default was called in new function of DictOxide

So that's quite unfortunate. I am not sure if we have ways to initialize the structure on the heap directly in a guaranteed way without unsafe.

There's certainly been discussion on this before https://users.rust-lang.org/t/how-to-create-large-objects-directly-in-heap/26405.

Lonami commented 4 months ago

Instead of simply changing the reset function to replace self, I've changed the entire structure with three Vec and removed the Box.

I don't know if we can do much better without unsafe (but I saw this crate bans unsafe).

Lonami commented 4 months ago

I guess three boxed slices is better.

Lonami commented 4 months ago

I asked them to try with my fork:

[patch.crates-io]
miniz_oxide = { git = "https://github.com/Lonami/miniz_oxide.git", branch = "reset-hashbuffers-inplace" }

and:

Yep can confirm, no more stack overflow in both debug and release mode

Lonami commented 4 months ago

Let me know if I should reset the last commit. 85KB is quite a bit to allocate on the stack in case the optimization fails again. So while this makes the object another usize large (to store the slice's length), I think it might be good to do it anyway for consistency.

oyvindln commented 4 months ago

I don't think we can use the method in those commits without introducing unsafe or drastically worsening performance. I can't easily profile until next week as I'm not at home but losing the length as part of the type will likely prevent the compiler from optimizing several bounds checks in the performance critical sections of the code.

Is the person having stack overflow issues having them in release mode?

oyvindln commented 4 months ago

Anyhow, we used Box::default as a workaround as Box::new didn't optimize out the stack copy in the past, but that's probably quite outdated now with stuff like this change, and the current rust sdl lib implementation of box::default calls box::new so I think that needs to be re-checked any maybe that would help here. (the reason why it helped in the past was that box::default used the old special box keyword directly on the structs default trait implementation previously)

Lonami commented 4 months ago

without introducing unsafe

Yep, it's quite unfortunate. But until Rust stabilizes a solution, I think we should choose either safety or performance.

Maybe this change should be under #[cfg(target_os = "windows")]? As I think Windows stack size might be smaller by default.

But, aborting the process on overflow isn't really better than a performance hit.

or drastically worsening performance

Let's wait on the benchmarks. Perhaps we can add assert_eq!(buffer.len(), EXPECTED) at the start of the functions and hope that helps.

Is the person having stack overflow issues having them in release mode?

From what I understood, yes.

DCNick3 commented 3 months ago

This approach can potentially be helpful, as it presumably allows to construct a boxed array without blowing the stack: https://github.com/rust-lang/rust/issues/63291#issuecomment-1281342953

oyvindln commented 3 months ago

As noted I will look into whether using new instead of default helps solve this and whether I can reproduce the stack overflow in release on windows but if not that method seems reasonable as it seems to avoid the other issues and just requires bumping the minimun rust version to 1.56.0.

oyvindln commented 3 months ago

Are you sure this is happening in release mode/with optimizations turned on? When looking at the generated assembly output in release mode I can only see the stack pointer being incremented by 28 bytes in the DictOxide::new() function, which ends up being optimized to just a alloc and memset calls with the underlying object functions inlined. So while the current init code isn't completely optimal and can be reworked a bit now that rust properly seem to optimize Box::new() properly which it hasn't always done, it already seem to be optimizing out the stack copies fine. Screenshot_20240312_192904

I also couldn't get it to trigger a stack overflow when running the unit tests on windows. that said with optimizations off it may end up using a bit more stack space which of course will add up in a larger application using stack space elsewhere which I guess might be the issue here. It looks like it's running in debug mode (which normally doesn't have optimization active) at least in the screenshots.

Lonami commented 3 months ago

I haven't tried reproducing the problem. All I know is the person that had the problem before, does not have it after this PR.

Lonami commented 3 months ago

Some more info.

The person was testing in a Windows server. It did not have the latest version of rustc. After update they no longer get the stack overflow in release mode, but they still get it in debug mode.

Unfortunately they didn't check the rustc version prior to upgrading.

DCNick3 commented 3 months ago

It would still be nice to have library work in debug mode too...

oyvindln commented 3 months ago

Yeah it works fine in debug mode normally. It's more an issue with how Rust works than this library in particular that compiling with no optimizations results in a lot of stack usage so a few tens of kilobytes of stack usage in many different libraries can add up in a large project.

I will have to look into whether it's an issue in release in older versions.

Frommi / miniz_oxide

Avoid using default in HashBuffers::reset #147