This commit fixes the memory accounting that is done in the SampleReservoir in an edge case. In particular, previously, in copyRow we would update the destination row (which might be the one coming from the "samples" slice) directly when processing each datum one at a time; if we happen to reach the memory limit in the middle of the row, we would have a "corrupt" row stored in the samples, which later on could lead to incorrect adjustment of the memory account when evicting that row. In fact, we had a comment warning about this situation, but we probably didn't appreciate the memory accounting drift that could fail the stats collection job in extreme cases (and would trigger the "no bytes in account to release" sentry report).
This problem is now fixed by first calculating the "before" and "after" sizes of the modified row via copying it into the staging scratch area, then asking for the necessary memory reservation, and only if that is approved copying the row into the destination. This way at every point in time the memory accounting is precise. This extra copy per sampled row should be negligible in the grand scheme of things. Additionally, I don't think we need to clear out the scratch row after each call because the datums will be overwritten on the next call to copyRow, so we'll lose the references to datums shortly (if there is no next call, then the last row is kept in the samples, so we just double the count of references to the datums by not clearing the scratch).
Fixes: #128241.
Release note (bug fix): Table statistics collection in CockroachDB could previously run into no bytes in account to release errors in some edge cases (when the SQL memory budget, configured via --max-sql-memory flag, was close to being exhausted). The bug has been present since 21.2 and is now fixed.
This commit fixes the memory accounting that is done in the SampleReservoir in an edge case. In particular, previously, in
copyRow
we would update the destination row (which might be the one coming from the "samples" slice) directly when processing each datum one at a time; if we happen to reach the memory limit in the middle of the row, we would have a "corrupt" row stored in the samples, which later on could lead to incorrect adjustment of the memory account when evicting that row. In fact, we had a comment warning about this situation, but we probably didn't appreciate the memory accounting drift that could fail the stats collection job in extreme cases (and would trigger the "no bytes in account to release" sentry report).This problem is now fixed by first calculating the "before" and "after" sizes of the modified row via copying it into the staging scratch area, then asking for the necessary memory reservation, and only if that is approved copying the row into the destination. This way at every point in time the memory accounting is precise. This extra copy per sampled row should be negligible in the grand scheme of things. Additionally, I don't think we need to clear out the scratch row after each call because the datums will be overwritten on the next call to
copyRow
, so we'll lose the references to datums shortly (if there is no next call, then the last row is kept in the samples, so we just double the count of references to the datums by not clearing the scratch).Fixes: #128241.
Release note (bug fix): Table statistics collection in CockroachDB could previously run into
no bytes in account to release
errors in some edge cases (when the SQL memory budget, configured via--max-sql-memory
flag, was close to being exhausted). The bug has been present since 21.2 and is now fixed.