FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.21k stars 211 forks source link

[FR] Database restore must make every effort on activating deferred indexes #7269

Open basid-irk opened 1 year ago

basid-irk commented 1 year ago

Database restore are aborted on first error of first problematic deferred index. Database restore must make every effort on activating deferred indexes - build them all and show full problem list for each (#7268)

ibaseru commented 1 year ago

I've made a confirmation test. Backup rest_bad.fbk is attached. Backup format - Firebird 2.5, it is ok to restore it on 3.0. Here are 6 tables, 3 master and 3 detail. A1 have duplicate value in primary key A2 have "record loss" that B2 foreign key points to A3 have duplicate value in primary key (as A1, but different value)

This backup was made after turning rdb$index_active to "3" and adding/deleting rows.

Currently on restore (2.5, 3.0, 4.0 was not tested) we have stop after FIRST problem with index _gbak:creating indexes gbak:committing metadata gbak: activating and creating deferred index PK_B3 gbak: activating and creating deferred index PK_B2 gbak: activating and creating deferred index PK_B1 gbak: activating and creating deferred index PK_A3 gbak:cannot commit index PK_A3 gbak: ERROR:violation of PRIMARY or UNIQUE KEY constraint "PKA3" on table "A3" gbak: ERROR: Problematic key value is ("ID" = 9) gbak: ERROR:action cancelled by trigger (2) to preserve data integrity gbak: ERROR: Cannot deactivate index used by an integrity constraint gbak:Exiting before completion due to errors

None indices after that are not activated, they still stay in "inactive" state (value 3 in rdb$index_inactive).

If we could have continuation of activating indices, there would be only 3 indices inactive - A1 PK, A3 PK, B2 FK.

Note that this kind of a problem can be fixed only by repetitive actions (b/r, or restore -i and activating indices one by one), which can take very LONG time on big databases. Ability to have inactive only problematic indices may save lot of time in such cases.

rtest_bad.zip

livius2 commented 1 year ago

If implemented, it should not be as a default action, but some kind of switch.

ibaseru commented 1 year ago

If implemented, it should not be as a default action, but some kind of switch.

What kind of incompatibility (or any side effect) you expect, if this will be default action? The initial issue does not expect the error to disappear. Current behavior forces admin to do more work on restored DB - to manually activate indices that were not activated by restore. Who ever will expect that "lot of indices inactive" is better than "as much as possible indices are activated"?

ibaseru commented 1 year ago

If implemented, it should not be as a default action, but some kind of switch.

well, you won, I changed my mind, let it be "some kind of switch".

hvlad commented 1 year ago

The restore code already tries to activate (build) as much indices as possible. It handle index activation errors and continues to the next index, setting RDB$INDEX_INACTIVE = TRUE for the failed index. And here we have a bug - if failed index is system index (such as PK or FK) then such update is prevented by system trigger and whole restore process is stopped.

I.e. there is no question if "this" should be default action or not - it is default action already.

livius2 commented 1 year ago

But the restore itself at the end will fail or will be succesfull? For me it should fail.

But if this is the default action already i will see needs for swith to turn it off. I do not like to spend time on bad restore, only it should fail for me immidietly.

hvlad commented 1 year ago

In such case restore leave database in shutdown state, report:

Database is not online due to failure to activate one or more indices.
Run gfix -online to bring database online without active indices.

and exit with code 2

livius2 commented 1 year ago

Exit code is ok then. But i see that switch to stop restore immidietly will be also helpfull to do not spend time if someone do not need this.

aafemt commented 1 year ago

and exit with code 2

Is it indicated in status vector for services API as well or one have to parse text output?

hvlad commented 1 year ago

AFAIU, exit code of service routine never reported back to the service user. I.e., from service manager POV, there is no error. And this is not unique\specific for gbak services.

AlexPeshkoff commented 1 year ago

It's prety easy to make all non-zero exit codes be reported as errors in services. If really needed.

aafemt commented 1 year ago

Without it services seem to be pretty useless.

AlexPeshkoff commented 1 year ago

On 8/24/22 14:31, Dimitry Sibiryakov wrote:

Without it services seem to be pretty useless.

I think this to be suggested in devel and if nobody sees problems - feel free to add a ticket, I will implement it. Notice - in status vector it will be a notice that service thread failed with exit code NNN, original message with a reason for is unknown at the final point.

GeF0rs commented 3 weeks ago

I've made a confirmation test. Backup rest_bad.fbk is attached.

Please tell me how to make such a test database? How to “break” it? You cannot directly deactivate a primary key.