oom_instead_of_bump_pointer_overflow fails on Debian armhf

plugwash commented 2 years ago

I have discovered that in a Debian armhf environment the oom_instead_of_bump_pointer_overflow test fails. This was initially noticed on Debian QA infrastructure but I have also tested it locally on both 32-bit and 64-bit kernels and with both the Debian packaged version of bumpalo and the upstream git. All failed.

To troubleshoot the issue I added an extra print statement if the allocation unexpectedly succeeds.

     bump.alloc_layout(layout);
+    eprintln!("alloc_layout succeed unexpectly: p={} size={}", p, size);
 }

And got the following

32-bit kernel: alloc_layout succeed unexpectly: p=3056602143 size=1238365153

64-bit kernel: alloc_layout succeed unexpectly: p=4136635439 size=158331857

I think this is a case of a wrong assumption in the test. Specifically if the allocator choses to make the initial allocation near the top of the address space (which it is perfectly entitled to do) then the follow up allocation may be relatively small and may succeed. But I'd like an opinion from someone more familiar with the code before I go ahead and treat this as a broken test.

fitzgen commented 2 years ago

Yes, this is an incorrect test.

It is true that this allocation will always cause the bump pointer to overflow inside try_alloc_layout_fast, but then we return None there and in try_alloc_layout we fallback to alloc_layout_slow which will allocate a new chunk and this can succeed.

I think we should

Change this test to allow either panic on OOM or successful allocation. Either is a valid behavior here, and we want to keep explicitly checking the code path where we overflow the bump pointer.
Add another test that explicitly allocates usize::MAX or something like that, and asserts that we get an OOM panic.

plugwash commented 2 years ago

Yes, this is an incorrect test.

Thanks for confirming

It is true that this allocation will always cause the bump pointer to overflow inside

i'm not convinced that is true, the calculation in the test seems to assume that bumpalo will make allocations by incrementing the bump pointer, but it seems that allocations are actually made by decrementing the bump pointer.

fitzgen commented 2 years ago

Ah good catch! This test is from before we made the increment-to-decrement switch and should also be updated for that.

plugwash commented 2 years ago

I guess the remaining question is how should the test distinguish between an uncaught bump pointer overflow (bad) and a successful follow-up allocation (good).

And I think the answer to that is by inspecting the returned pointer and seeing if it is a feasible pointer for the allocation size in question.

fitzgen commented 2 years ago

I don't think we need to do anything specific to detect/inspect the returned pointer. Instead, we just need to make sure we write a byte per page in the resulting allocation. We run tests under valgrind in CI and that (or even regular virtual memory protections) should catch any issues if we try to write out of bounds.

This is much easier to get 100% correct than "does this look like a valid pointer?" heuristics.

plugwash commented 2 years ago

Writing a byte per page is not a good idea because the allocation may be HUGE and Linux overcommits memory, so the outcome of doing that may well be that your tests get killed by the OOM killer.

If you don't want to go down the route of checking the returned pointer for sanity then I think a better soloution may be to test "try_alloc_layout_fast" directly.

fitzgen / bumpalo

oom_instead_of_bump_pointer_overflow fails on Debian armhf #128