Closed Lonami closed 3 months ago
The compiler ought to be able to optimize that yeah but I'll have to check. I will be away for a week so may not be able to check properly until then.
Unfortunately "ought to optimize it" is not a good thing to rely on when it can abort the entire process (not my screenshot):
At first I wanted to make all buffers a Vec, before I realized they were already Boxed.
Initialization seems fine, as we create the Box directly, and that's probably as safe as we can be it will be done directly on the heap.
But this may run on the stack and then be copied depending on optimization level.
The person I mentioned sent me an update, with a screenshot too:
The default was called in
new
function ofDictOxide
So that's quite unfortunate. I am not sure if we have ways to initialize the structure on the heap directly in a guaranteed way without unsafe
.
There's certainly been discussion on this before https://users.rust-lang.org/t/how-to-create-large-objects-directly-in-heap/26405.
Instead of simply changing the reset
function to replace self
, I've changed the entire structure with three Vec
and removed the Box
.
I don't know if we can do much better without unsafe
(but I saw this crate bans unsafe
).
I guess three boxed slices is better.
I asked them to try with my fork:
[patch.crates-io]
miniz_oxide = { git = "https://github.com/Lonami/miniz_oxide.git", branch = "reset-hashbuffers-inplace" }
and:
Yep can confirm, no more stack overflow in both debug and release mode
Let me know if I should reset the last commit. 85KB is quite a bit to allocate on the stack in case the optimization fails again. So while this makes the object another usize large (to store the slice's length), I think it might be good to do it anyway for consistency.
I don't think we can use the method in those commits without introducing unsafe or drastically worsening performance. I can't easily profile until next week as I'm not at home but losing the length as part of the type will likely prevent the compiler from optimizing several bounds checks in the performance critical sections of the code.
Is the person having stack overflow issues having them in release mode?
Anyhow, we used Box::default as a workaround as Box::new didn't optimize out the stack copy in the past, but that's probably quite outdated now with stuff like this change, and the current rust sdl lib implementation of box::default calls box::new so I think that needs to be re-checked any maybe that would help here. (the reason why it helped in the past was that box::default used the old special box keyword directly on the structs default trait implementation previously)
without introducing unsafe
Yep, it's quite unfortunate. But until Rust stabilizes a solution, I think we should choose either safety or performance.
Maybe this change should be under #[cfg(target_os = "windows")]
? As I think Windows stack size might be smaller by default.
But, aborting the process on overflow isn't really better than a performance hit.
or drastically worsening performance
Let's wait on the benchmarks. Perhaps we can add assert_eq!(buffer.len(), EXPECTED)
at the start of the functions and hope that helps.
Is the person having stack overflow issues having them in release mode?
From what I understood, yes.
This approach can potentially be helpful, as it presumably allows to construct a boxed array without blowing the stack: https://github.com/rust-lang/rust/issues/63291#issuecomment-1281342953
As noted I will look into whether using new instead of default helps solve this and whether I can reproduce the stack overflow in release on windows but if not that method seems reasonable as it seems to avoid the other issues and just requires bumping the minimun rust version to 1.56.0.
Are you sure this is happening in release mode/with optimizations turned on? When looking at the generated assembly output in release mode I can only see the stack pointer being incremented by 28 bytes in the DictOxide::new()
function, which ends up being optimized to just a alloc and memset calls with the underlying object functions inlined. So while the current init code isn't completely optimal and can be reworked a bit now that rust properly seem to optimize Box::new()
properly which it hasn't always done, it already seem to be optimizing out the stack copies fine.
I also couldn't get it to trigger a stack overflow when running the unit tests on windows. that said with optimizations off it may end up using a bit more stack space which of course will add up in a larger application using stack space elsewhere which I guess might be the issue here. It looks like it's running in debug mode (which normally doesn't have optimization active) at least in the screenshots.
I haven't tried reproducing the problem. All I know is the person that had the problem before, does not have it after this PR.
Some more info.
The person was testing in a Windows server. It did not have the latest version of rustc
. After update they no longer get the stack overflow in release mode, but they still get it in debug mode.
Unfortunately they didn't check the rustc
version prior to upgrading.
It would still be nice to have library work in debug mode too...
Yeah it works fine in debug mode normally. It's more an issue with how Rust works than this library in particular that compiling with no optimizations results in a lot of stack usage so a few tens of kilobytes of stack usage in many different libraries can add up in a large project.
I will have to look into whether it's an issue in release in older versions.
My own library indirectly depends on miniz via
use flate2::write::GzDecoder;
. A user reported to me a stack overflow, which seems to come from the default implementation. I don't know if they're using Windows, but it might be similar to #21.I don't actually know for sure if
reset
is the problem, but it seemed like it doesn't crash immediately, so it might be thereset
as opposed to initializing the buffers the first time around.I have kept the
#[inline]
, but I don't know if it still makes sense (hopefully the optimizer can emit a singlememset
; I'm having trouble running the benchmarks on Windows).