Harden read buffer pool

vlovich commented 3 months ago

Had a problem where I was pointing to the contents of a ReadResult that I'd dropped (yes, unsafe was involved). Running under ASAN didn't do anything because the underlying memory is still allocated within Glommio.

When building in debug, fill the dropped buffer with a sentinel so that it's more obvious that it's been pointed to (in release could maybe fill just the leading 4 bytes with 0xdeadbeef instead of the full memory region, or control this behavior more fine-grained with feature gates)
Annotate __asan_poison_memory_region when the buffer is returned to the pool when building with ASAN to catch these issues.
Investigate if we put a dropped buffer to the head or tail of the pool (most recently returned buffer should be returned last from the pool to make immediate reuse less likely & thus more likely to trigger ASAN)

No immediate plans to implement this but soliciting feedback on these ideas. @glommer thoughts?

glommer commented 3 months ago

I don't like 3 very much. Memory issues and "likely" are an explosive combination. Both 1 and 2 seem like good options. If I understand your plan correctly, 2 seems like the best, as it would work in a release binary as well.

vlovich commented 3 months ago

It's more about a defense in depth strategy since fundamentally there's no way to completely bullet proof catching memory issues due to unsafe & it gets worse when there's a small buffer pool being reused.

Can you help me understand the concerns with option 3? I believe it's a common technique for hardened allocators and I can't imagine there's too many issues: https://llvm.org/docs/ScudoHardenedAllocator.html. Not quite the full Quarantine option but it's ostensibly roughly the same idea, just "0-cost".

Another hardening technique that popped into my head that's I think my own invention is to mmap the buffer pool into many virtual buffer pools. I believe ASAN works on the virtual address, which means then your allocator can ping-pong between different pools to delay reusing the same address almost arbitrarily further. There's some performance cost obviously from the TLB pollution, but probably negligible in practice & not something that would be on unless explicitly opted into / building with ASAN. Not saying I'd ever do it, but fun nerd snipe to rumminate on.

And yes, anything expensive or potentially expensive should be behind a default-off feature flag (e.g. harden-memory-allocator) that is only on if you explicitly turn it on or build with ASAN.

DataDog / glommio

Harden read buffer pool #643