arenadata / gpdb

Arenadata DB
https://docs.arenadata.io/en/ADB/current/introduction/intro.html
Apache License 2.0
40 stars 22 forks source link

ADBDEV-6368: Fix temporary schema cleanup after segment failure (#1022) #1062

Closed silent-observer closed 1 week ago

silent-observer commented 1 month ago

Fix orphan temp table on the coordinator (#11383)

When the backend process panics on the segment, temp tables should be dropped on the coordinator. Before this, QD only resets the gang upon detecting the QE crash but not dropping temp tables because it's still in a transaction block at that point.

Changes from original commit:

  1. No renaming CheckForResetSession into GpDropTempTables due to ABI compatibility.
  2. ResetAllGangs() is missing on GPDB 6, no changes necessary.
  3. Fix autovacuum issue by using gpcheckcat instead.

Co-authored-by: Gang Xiong gangx@vmware.com Co-authored-by: Adam Lee adlee@vmware.com

(cherry picked from commit https://github.com/arenadata/gpdb/commit/e88ceb8e399074189160a8a2a2eb906bdfebbb12)


Fix orphaned temp namespace catalog entry left on coordinator

Note that, commit e88ceb8e399074189160a8a2a2eb906bdfebbb12 fixed orphaned temp table left on coordinator, but it did't handle orphaned namespace catalog entry left on coordinator. This commit fix that.

(cherry picked from commit 0dcc97c76e9b7f54b1a5361fec970bd36a6e4743)


Fix orphaned temp table on coordinator

When session exits within a transaction block, temp table created in this session will be left. This is because ResetAllGangs will reset myTempNamespace, normal temp table catalog clean up won't be called. More details could ref https://github.com/greenplum-db/gpdb/issues/16506

Noted that, when a session is going to exit on coordinator, there is no need to drop temp tables by calling GpDropTempTables, callback function will do that at the end.

Changes from original commit:

  1. ResetAllGangs() is missing, changed the logic to equivalent.
  2. Fix test output (CREATE TABLE).

(cherry picked from commit c89fada973ee975efe8f56f543013ee5068fad8b)


After some segment failures (such as during long insert operations) temporary schema wasn't deleted even after the session exits. This error was caused by a complicated interaction between two places that delete temporary tables: RemoveTempRelationsCallback and GpDropTempTables.

This patch refactors these two functions to make their actions more clear. GpDropTempTables is split into two functions: GpResetSessionIfNeeded and GpDropTempTables, and RemoveTempRelationsCallaback reuses GpDropTempTables. There is also a new function GpHasTempRelationsForDeletion that is used to check if any cleanup is necessary. Cancelling the RemoveTempRelationsCallback is also moved from ResetTempNamespace (which doesn't fully delete it yet, so the callback might still be needed) to GpDropTempTables. With this the issue is now fixed.

Changes from original commit:

  1. Function equivalent to GpDropTempTables in GPDB 7 was named CheckForResetSession in GPDB 6. We keep it for backwards compatibility, now calling GpResetSessionIfNeeded and GpDropTempTables in it (same as in GPDB 7).
  2. ResetAllGangs is missing in GPDB 6, no fix needed.
  3. postgres.c already uses CheckForResetSession in GPDB 6, no fix needed.
  4. Add a delay before checking whether the table was dropped.

(cherry picked from commit 465f4699b84930f439b75d1118657530e38414a1)


Note: do not squash the commit to preserve authorship.

BenderArenadata commented 1 month ago

Allure report https://allure.adsw.io/launch/81533

RekGRpth commented 4 weeks ago

Changes from original commit:

1. Add orphan_temp_table isolation test from GPDB 7 in its entirety.

2. Change fault injector from before_exec_scan (unavailable in GPDB 6) to
   create_function_fail.

(cherry picked from commit 465f469)

Can you also describe the other changes compared to the original patch? For example, about CheckForResetSession, ResetAllGangs and postgres.c

silent-observer commented 3 weeks ago

Can you also describe the other changes compared to the original patch?

Fixed

BenderArenadata commented 3 weeks ago

Allure report https://allure.adsw.io/launch/81973

BenderArenadata commented 2 weeks ago

Allure report https://allure.adsw.io/launch/82900

silent-observer commented 2 weeks ago

The test seems to be flaky, should be reworked

BenderArenadata commented 2 weeks ago

Allure report https://allure.adsw.io/launch/82978

RekGRpth commented 2 weeks ago

commit e88ceb8 fixed orphaned temp table left on coordinator

Where is the cherry-pick of this commit in 6X?

silent-observer commented 2 weeks ago

Where is the cherry-pick of this commit in 6X?

This is impossible to cherry-pick because it would break ABI compatibility (the function CheckForResetSession was renamed). The only other change that commit made was change NeedResetSession logic, which is not necessary after the refactor (compare this to GPDB 7 code).

RekGRpth commented 2 weeks ago

Where is the cherry-pick of this commit in 6X?

This is impossible to cherry-pick because it would break ABI compatibility (the function CheckForResetSession was renamed). The only other change that commit made was change NeedResetSession logic, which is not necessary after the refactor (compare this to GPDB 7 code).

That patch added a new test, but for some reason yours is added in a different commit!

silent-observer commented 2 weeks ago

That patch added a new test, but for some reason yours is added in a different commit!

Okay, I will cherry pick just the test from that commit.

RekGRpth commented 2 weeks ago

That patch added a new test, but for some reason yours is added in a different commit!

Okay, I will cherry pick just the test from that commit.

Why only a test? I think it's worth cherry-pick the entire patch, making the necessary changes so that just one commit would work.

BenderArenadata commented 1 week ago

Allure report https://allure.adsw.io/launch/83254

BenderArenadata commented 1 week ago

Allure report https://allure.adsw.io/launch/83350

BenderArenadata commented 1 week ago

Allure report https://allure.adsw.io/launch/83391

RekGRpth commented 1 week ago

Note that, commit e88ceb8 fixed orphaned temp table left on

Maybe we need to change the commit https://github.com/arenadata/gpdb/pull/1062/commits/0ccd9551a82474147d19eaa0574199a290a2b429 link?

BenderArenadata commented 1 week ago

Allure report https://allure.adsw.io/launch/83548

BenderArenadata commented 1 week ago

Allure report https://allure.adsw.io/launch/83669