FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.21k stars 209 forks source link

Firebird cannot open a database after a power loss [CORE3235] #1285

Open firebird-automations opened 13 years ago

firebird-automations commented 13 years ago

Submitted by: Gili Buzaglo (gland)

Is duplicated by CORE3113

Attachments: corrupted.cmr corrupted26-12.gdb corrupted26-12.gdb unrestorable.gbk.gz

Power failre occus during normal work with a database. Application cannot connect to the database after power is restored. We get the following exceptions: org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544333. internal gds software consistency check (can't continue after bugcheck) at org.firebirdsql.jdbc.AbstractPreparedStatement.<init>(AbstractPreparedStatement.java:127) at org.firebirdsql.jdbc.FBPreparedStatement.<init>(FBPreparedStatement.java:41) at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.firebirdsql.jdbc.FBStatementFactory.createPreparedStatement(FBStatementFactory.java:90)

org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544333. internal gds software consistency check (cannot find record fragment (248), file: dpm.cpp line: 1181) at org.firebirdsql.jdbc.FBStatementFetcher.fetch(FBStatementFetcher.java:206) at org.firebirdsql.jdbc.FBStatementFetcher.next(FBStatementFetcher.java:119) at org.firebirdsql.jdbc.AbstractResultSet.next(AbstractResultSet.java:250) at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.readData(PeriodicReplicationManager.java:106) at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.run(PeriodicReplicationManager.java:503)

and later

at org.firebirdsql.gds.GDSException: internal gds software consistency check (cannot find record fragment (248), file: dpm.cpp line: 1181) at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.readStatusVector(AbstractJavaGDSImpl.java:2169) at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.receiveResponse(AbstractJavaGDSImpl.java:2119) at org.firebirdsql.gds.impl.wire.AbstractJavaGDSImpl.iscDsqlFetch(AbstractJavaGDSImpl.java:1350) at org.firebirdsql.gds.impl.GDSHelper.fetch(GDSHelper.java:264) at org.firebirdsql.jdbc.FBStatementFetcher.fetch(FBStatementFetcher.java:201) at org.firebirdsql.jdbc.FBStatementFetcher.next(FBStatementFetcher.java:119) at org.firebirdsql.jdbc.AbstractResultSet.next(AbstractResultSet.java:250) at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.readData(PeriodicReplicationManager.java:106) at cloverleaf.manager.mbe.rm.volume.PeriodicReplicationManager.run(PeriodicReplicationManager.java:503) Aug 22, 2010 4:37:48 AM cloverleaf.manager.database.db.SqlConnectionPool$Pool returnConnection();

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

Output of gstat ih is: Flags 0 Checksum 12345 Generation 5042 Page size 4096 ODS version 11.1 Oldest transaction 4995 Oldest active 4996 Oldest snapshot 4996 Next transaction 5022 Bumped transaction 1 Sequence number 0 Next attachment ID 117 Implementation ID 3 Shadow count 0 Page buffers 0 Next header page 0 Database dialect 3 Creation date Oct 18, 2010 14:41:22 Attributes force write

Variable header data: 
    Sweep interval: 0 
    \*END\* 
firebird-automations commented 13 years ago
Modified by: Gili Buzaglo (gland) Attachment: corrupted\.cmr \[ 11820 \]
firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

corrupted db is attached

firebird-automations commented 13 years ago

Commented by: @dyemanov

Can ISQL connect or does it also fail with the same error?

firebird-automations commented 13 years ago

Commented by: Greg (greg)

Did you try a backup./restore ?

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

Hi Thanks for the reply Isql connects fine, gfix -mend fix it but I'm not sure it will always fix it.

firebird-automations commented 13 years ago

Commented by: Greg (greg)

Well, we've been using from Interbase 4 to Firebird 2.0 and never lost a database. Even if sometimes gfix -mend was not enough to recover the database. GFIX always did the job so far... it's a common maintenance after a power loss for us... :)

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

database engine should be immune to sudden power losses and not require customer support in field to fix such issues. This is usually acheived by using a transaction log. For the meantime as a workarround, when my application starts it checks if th db is ok with gfix -v -f and if not runs a recovery procedure: 1) gfix -mend 2) backup 3) restore

thanks -gili

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

BTw Its already possible to connect to the databse after gfix -mend alone. How is that?

firebird-automations commented 13 years ago

Commented by: @dyemanov

Generally speaking, it's impossible to guarantee any reliability in the environment you cannot control. For example, no transaction log (or any alternative) can protect you against a storage controller with a write-through cache turned on but without the battery inside.

firebird-automations commented 13 years ago

Commented by: Sean Leyne (seanleyne)

In addition to the other comments, I would also ask if Forced Writes is turned on for the database?

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

for dimitry: I guess you mean a storage controller with write-back turned on. In this case the engine is not expected to keep the database safe. But when the administrator delivers a storage with synchronous writes he expects its db to survive power losses.

for sean: Forced writes is on.

Thanks guys

firebird-automations commented 13 years ago

Commented by: @hvlad

BTW, this is 2.1.1 on Solaris... Could it be related to CORE1476 ?

firebird-automations commented 13 years ago

Commented by: @hvlad

The gstat output is not corresponds to the attached db. See also comments at CORE3113

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

hi Is it possible that attributes are not valid when the db gets corrupted? I ask this because all ower dbs are created with fw=true. Or is there any way that this attribute gets lost via gbak -b -> gbak -rep?

firebird-automations commented 13 years ago

Commented by: @dyemanov

Commented by Vlad Khorsun:

> Is it possible that attributes are not valid when the db gets corrupted? I don't think its possible. It is very hard to imagine that single bit was inverted on header page :)

> > I ask this because all ower dbs are created with fw=true. > > Or is there any way that this attribute gets lost via gbak -b -> gbak -rep? Only case i can think of is if restore was not successful by some reason. IIRC, FW attribute is set as the last step of restore process. But in such case database should be left in single-user shutdown mode...

firebird-automations commented 13 years ago
Modified by: @dyemanov Link: This issue is duplicated by [CORE3113](https://github.com/FirebirdSQL/firebird/issues?q=CORE3113+in%3Atitle) \[ [CORE3113](https://github.com/FirebirdSQL/firebird/issues?q=CORE3113+in%3Atitle) \]
firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

I've attached a new file. Same problem (although this time the exception is a bit different)....

jbird throws an exceptoin: wrong page type page 3 is of wrong type (expected 4, found 0) at org.firebirdsql.jdbc.FBDataSource.getConnection(FBDataSource.java:122) at org.firebirdsql.jdbc.FBDriver.connect(FBDriver.java:131) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:185) at cloverleaf.manager.database.db.DatabaseInterface.getConnection(DatabaseInterface.java:60) at cloverleaf.manager.database.db.DatabaseManager.init(DatabaseManager.java:113) at cloverleaf.manager.database.db.DatabaseManager.<init>(DatabaseManager.java:83) at cloverleaf.manager.mbe.run.startMbeServer.fixDB(startMbeServer.java:1683) at cloverleaf.manager.mbe.run.startMbeServer.<init>(startMbeServer.java:158) at cloverleaf.manager.mbe.run.startMbeServer.getInstance(startMbeServer.java:94) at cloverleaf.manager.mbe.run.startMbeServer.main(startMbeServer.java:1175)

you can see that gstat -h shows force write is on.

/usr/local/firebird/bin/gstat -h /var/opt/CLLF/db/cmr.gdb

Database "/var/opt/CLLF/db/cmr.gdb" Database header page information: Flags 0 Checksum 12345 Generation 25560 Page size 1024 ODS version 10.1 Oldest transaction 25534 Oldest active 25535 Oldest snapshot 25535 Next transaction 25553 Bumped transaction 1 Sequence number 0 Next attachment ID 13 Implementation ID 3 Shadow count 0 Page buffers 0 Next header page 0 Database dialect 3 Creation date May 11, 2009 11:08:26 Attributes force write

Variable header data:
    Sweep interval:         0
    \*END\*
firebird-automations commented 13 years ago
Modified by: Gili Buzaglo (gland) Attachment: corrupted26\-12\.gdb \[ 11854 \]
firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

A correction to last comment: The real exception that I get is(same like the original): org.firebirdsql.jdbc.FBSQLException: GDS Exception. 335544333. internal gds software consistency check (can't continue after bugcheck) Reason: internal gds software consistency check (can't continue after bugcheck) at org.firebirdsql.jdbc.InternalTransactionCoordinator$MetaDataTransactionCoordinator.statementCompleted(InternalTransactionCoordinator.java:535) at org.firebirdsql.jdbc.AbstractStatement.notifyStatementCompleted(AbstractStatement.java:246) at org.firebirdsql.jdbc.AbstractPreparedStatement.notifyStatementCompleted(AbstractPreparedStatement.java:143) at org.firebirdsql.jdbc.AbstractPreparedStatement.<init>(AbstractPreparedStatement.java:126) at org.firebirdsql.jdbc.FBPreparedStatement.<init>(FBPreparedStatement.java:41) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

gstat -h output:

/usr/local/firebird/bin/gstat -h /var/opt/CLLF/db/cmr_old.gdb

Database "/var/opt/CLLF/db/cmr_old.gdb" Database header page information: Flags 0 Checksum 12345 Generation 97 Page size 4096 ODS version 11.1 Oldest transaction 57 Oldest active 88 Oldest snapshot 88 Next transaction 89 Bumped transaction 1 Sequence number 0 Next attachment ID 21 Implementation ID 3 Shadow count 0 Page buffers 0 Next header page 0 Database dialect 3 Creation date Dec 26, 2010 10:19:42 Attributes force write

Variable header data:
    Sweep interval:         0
    \*END\*
firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

This attachement is a correction to previous attachement

firebird-automations commented 13 years ago
Modified by: Gili Buzaglo (gland) Attachment: corrupted26\-12\.gdb \[ 11855 \]
firebird-automations commented 13 years ago

Commented by: @hvlad

Error "page 3 is of wrong type (expected 4, found 0)" means that first pointer page is corrupted. Looking at database file i see that whole page contents filled with zero's. Looking at database header statistics i see that a) database ODS is 10.1, this is database created by FB 1.5 b) page size is 1K, this is very bad value from performance POV c) database was created 11 may 2009 but next attachment id is just 13 - nobody works with this database

This all looks very strange for me. The fact that whole page is filled by zeros make me think about HW (or driver) issues. Anyway, without reproducible test case it is impossible to investigate this issue.

Your last comment again contains stats from *another* database. It didn't make easier for us to understand the issue...

firebird-automations commented 13 years ago

Commented by: @hvlad

The latest attach contains corrupted database. If you'll validate it you'll see following errors

    Chain for record 3115 is broken in table RDB$RELATIONS \(6\) 
    Relation has 6 orphan backversions \(123 in use\) in table RDB$SECURITY\_CLASSES \(9\) 
    Index 1 is corrupt \(missing entries\) in table ENTITY \(241\) 

Database can be backed up and restored without a problem. It is very unusual that system relations have a lot of backversions (RDB$RELATIONS have 103 backversions and RDB$SECURITY_CLASSES have 123 backversions).

What do you do with database ?

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

hi As I wrote in my previous comment, ignore the prvious db as it was copied in the wrong time. The last attachment contains the problematic db. I dont know what are backversions. the operations that we do with the db are: 1) read/write queries 2) gbak/restore 3) Meta data changes from time to time to add/remove tables.

This problem only occurs when we shutdown the machine, but also on rare ocasions. In the firebird log I see that it terminated ok:

cc3r5 (Client) Sun Dec 26 10:29:16 2010 /usr/local/firebird/bin/fbguard: /usr/local/firebird/bin/fbserver normal shutdown.

firebird-automations commented 13 years ago

Commented by: Gili Buzaglo (gland)

on one of the corruptions after shutdown I tries to use gfix and gbak and restore. So gfix gbak went ok but restore fails with an error: gbak: ERROR:attempt to store duplicate value (visible to active transactions) in unique index "RDB$PRIMARY101" action cancelled by trigger (3) to preserve data integrity -Cannot deactivate index used by a PRIMARY/UNIQUE constraint

I've attached the file. Please any help appriciated

firebird-automations commented 13 years ago
Modified by: Gili Buzaglo (gland) Attachment: unrestorable\.gbk\.gz \[ 11894 \]