apache / cloudberry

One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
https://cloudberry.apache.org
Apache License 2.0
417 stars 104 forks source link

fix incorrect first Tid during index scan which using bitmap index #679

Closed gongxun0928 closed 1 month ago

gongxun0928 commented 1 month ago

Change logs

the _bitmap_findnexttids function has issues. It incorrectly skips data in some cases

The BMIterateResultstructure can store a maximum of 16 * 1024 = 16,384 TIDs, and in each iteration, we can return at most 16,384 TIDs. However, the BMBatchWordsstructure can store 3,968 words, each of which can be compressed (as specifiedby BMBatchWords hwords; see details in src/backend/access/bitmap/README). The maximum number of TIDs that can be stored in one word is equal to BM_MAX_TUPLES_PER_PAGE, which is an integer multiple of 64 and greater than 16,384.

This PR aims to fix the bug in this case. If words->cwords[0] is compressed and its length exceeds 16,384, the TIDs greater than 16,384 in cwords[0] will be skipped during access.

the first call: param words: { firstTid = 1; cwords[0] = 0x80000200 hwords[0] = 0xc0000000 }

result: { nextTid = 1; lastScanWordNo = 0; nextTids = (0, repeat 16384 times); numOfTids = 0; nextTidLoc 1; } After the first call: word: { firstTid = 1; cwords[0] = 0x80000100 hwords[0] = 0xc0000000 } result: { nextTid = 16385; lastScanWordNo = 0; nextTids = (1, 2, 3,..., 16384); numOfTids = 16384; nextTidLoc = 1; }

the second call: word: { firstTid = 1; cwords[0] = 0x80000100 hwords[0] = 0xc0000000 } result: { nextTid = 16385; lastScanWordNo = 0; nextTids = (1, 2, 3,..., 16384); numOfTids = 16384; nextTidLoc = 16385; }

words->firstTid is still equal to 1, but cwords[0] has already decremented the number of TIDs returned during the first call (0x100 * 64 = 16,384). At this point, firstTid does not match the actual data in cwords, and it is less than result->NextTid. The _bitmap_catchup_to_next_tid function will adjust firstTid, and simultaneously decrement cwords[0] by 16,384, resulting in TIDs 16,385 to 32,768 being skipped.

We need to ensure that firstTid is in the correct position. Therefore, every time we modify cwords, whether iterating by bit within a single word or skipping an entire word, we must synchronize the update of firstTid.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

gfphoenix78 commented 1 month ago

Can we produce some cases that hit this issue?

my-ship-it commented 1 month ago

Please fix CI failures?