FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.26k stars 217 forks source link

Database file appears corrupted after restore from backup #7942

Closed gsbelarus closed 7 months ago

gsbelarus commented 10 months ago

There are multiple Firebird instances on the server: FB25, FB3, and FB5. Each instance is assigned a dedicated port and an appropriate service name. When using a third-party application to perform a restore, it connects to the server via a TCP connection using the connection string localhost/3056:some_path_to_database.

At the end of the restoration process, the following error message appears:

Unable to complete network request to host "localhost". Error reading data from the connection.

Additionally, there is a record in the firebird.log file:

XNET error: XNET server initialization failed. Probably, another instance of the server is already running.

The resulting database file appears to be corrupted, and subsequent gfix -v -full ... shows:

Number of record level errors : 18722

gsbelarus commented 8 months ago

@hvlad there is a warning in the OS system log:

Fault bucket 1440612613330618035, type 5
Event Name: RADAR_PRE_LEAK_64
Response: Not available
Cab Id: 0

Problem signature:
P1: firebird.exe
P2: 5.0.1.1369
P3: 10.0.17763.2.0.0
P4: 
P5: 
P6: 
P7: 
P8: 
P9: 
P10: 

Attached files:
\\?\g:\temp\RDRB702.tmp\empty.txt
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERB703.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERB713.tmp.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERB731.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERB741.tmp.txt

These files may be available here:

Analysis symbol: 
Rechecking for solution: 0
Report Id: a8edd257-1b38-4aee-a1d3-fc1b6db57ca8
Report Status: 268435456
Hashed bucket: 632b11fed92f1964d3fe158a0484c6b3
Cab Guid: 0
gsbelarus commented 8 months ago

the process started at 1:53, then at 2:22 the warning (see above) had been added to the OSs log, at 4:30 process finished.

in contrary to what we had a few months before, now, even though gbak been finishing with the error message, the subsequent gfix call didn't find any corruptions of the database structure.

hvlad commented 8 months ago

@hvlad there is a warning in the OS system log:

Fault bucket 1440612613330618035, type 5
Event Name: RADAR_PRE_LEAK_64

You may ignore it. See, for example

https://answers.microsoft.com/en-us/windows/forum/all/radarpreleak/372855fb-1285-4f99-a352-4fd517e9b680 :

Radar_Pre_Leak is technical nomenclature that means the OS has detected (Radar) a process that is not handling it routines in a manner that it thinks is efficient (Pre) and as a result it may begin to exhaust memory resources (Leak), i.e. it is a resource hog. It is entirely informational event, though, and does not mean it is necessarily running outside its bounds or has exhausted the limits of Windows memory management.

hvlad commented 8 months ago

in contrary to what we had a few months before, now, even though gbak been finishing with the error message, the subsequent gfix call didn't find any corruptions of the database structure.

And empty validation log confirms it.

So, what is current state of things ?

I see that restore still have some problem on detach or when reconnecting after successful restore, as there is no message 'adjusting the ONLINE and FORCED WRITES flags' and database remains in multi-user maintenance mode.

gsbelarus commented 8 months ago

Well, I ran restoring using only gbak without -se switch and got this error:

21:32:35,79 C:\Program Files\FB5>gbak -r g:\XXXXXXXXXXX.bk g:\XXXXXXXXXXXXXX.fdb -user sysdba -pas XXXXXXXXX -z -v > g:XXXXXXXXXXXXX.txt
gbak: ERROR:action cancelled by trigger (3) to preserve data integrity
gbak: ERROR:    Cannot deactivate index used by a PRIMARY/UNIQUE constraint
gbak:Exiting before completion due to errors

last records from the log file:

gbak:    activating and creating deferred index RDB$PRIMARY1477
gbak:    activating and creating deferred index RDB$PRIMARY1495
gbak:    activating and creating deferred index RDB$PRIMARY1474
gbak:    activating and creating deferred index RDB$PRIMARY1466
gbak:    activating and creating deferred index RDB$PRIMARY1501
gbak:    activating and creating deferred index RDB$PRIMARY1498
gbak:    activating and creating deferred index RDB$PRIMARY1494
gbak:cannot commit index RDB$PRIMARY1494
gbak: ERROR:invalid database handle (no active connection)

new entries in the firebird.log

LEPEL   Tue Mar 19 01:59:32 2024
    Failed to create worker attachment

    database G:\XXXXXXXXXXXX.FDB shutdown

LEPEL   Tue Mar 19 01:59:33 2024
    Failed to create worker attachment

    database G:\XXXXXXXXXXXX.FDB shutdown
hvlad commented 8 months ago

Could you provide me with this backup to avoid endless guessing and fix the issue ?

hvlad commented 7 months ago

Anything ?

gsbelarus commented 7 months ago

Anything ?

Is there an email or messenger through which we can discuss the matter, as it requires arranging access to the database somehow?

hvlad commented 7 months ago

Sure, email me to the hvlad at users sf com

gsbelarus commented 7 months ago

ok. I will try another restore attempt on the different machine using a backup file from a different day, If an error occurs I will get in touch.

gsbelarus commented 7 months ago

Spent half of the week conducting tests on three different machines and discovered something interesting. The restoration of the database crashes when the server mode is set to Classic. It crashes regardless of whether we use the -se switch in the gbak's command line. However, when the server mode is set to Super, the database restores without any problems.

I will attempt to obscure the database and provide a copy to @hvlad for further investigation.

hvlad commented 7 months ago

Good to know, thanks

hvlad commented 7 months ago

With Classic mode the issue is reproduced, fix will follow soon.

hvlad commented 7 months ago

The fix is committed into v5, please check next snapshot build. The bug really doesn't affects SuperServer and related with parallel index creation.

gsbelarus commented 7 months ago

Confirmed. Everything is ok now. Many thanks!

BTW, I have set MaxParallelWorkers = 64 and ParallelWorkers = 16 and the speed of restoring is awesome!