Closed TheGOro closed 2 years ago
@TheGOro
Hello, corrupt buffers should be taken care of in RingBuffer::scan()
and I don't see how they can leak to the recovery stage. Do you have any test that proves that this patch fixes some problem? Logical explanation would do ;)
@TheGOro Hello, corrupt buffers should be taken care of in
RingBuffer::scan()
and I don't see how they can leak to the recovery stage. Do you have any test that proves that this patch fixes some problem? Logical explanation would do ;)
The RingBuffer::scan()
function encounters this very issue when it tries to insert an invalid memory address (resulted by the corrupt size field in the previous buffer header) as a pointer into the seqno2ptr
dequeue, and it bails out gracefully due to the catch. RingBuffer::recover()
continues and when scan()
had managed to put at least one entry to seqno2ptr
, then we could reach the "discard all the locked-in buffers" stage, where another iteration is done on the buffer headers starting from first_
. Here we experience the segmentation fault because the size field is attempted to be accessed (in order to update the progress) while the buffer header pointer points to an invalid memory region.
Thanks for reporting and bringing our attention to this issue. It was fixed in the most recent release, but in a different way, notably commits fbf9167c4a0e03df0284f278024a9d41d3633c4e, d91a37310de28979d3f9d95f3b58dbb5e7756bbe, d2baa5bce511bbd052b3f244b6754b48587396af
4bb58377cf3ee02e4c69ce329c2e099b07c79368 did not solve the whole GCache recovery from a corrupted
galera.cache
file use-case, as the scan for discarding locked-in buffers part could experience the same issue, when the size is incorrect in the given buffer header.