Open wurdak314 opened 2 years ago
Run validation on regular basis (gfix -val -full) and check for messages at firebird.log put by validation
Thank you for the tip. I have tried the gfix validation on the saved database after the first data loss. gfix gives only a warning:
Summary of validation errors
Number of record level warnings : 1
I didn´ t find much information about this warning message.
Do you think its possible that the sweep causes the data loss?
More information is in firebird.log. Warnings are harmless, don't worry. I never seen data loss because of sweep. So far there is not enough information to make any conclusion.
There are no special entries in the firebird.log. Only the Sweep messages as you see in my first post and connection aborts like
INET/inet_error: read errno = 10054, client host = xxxx, address = x.x.x.x/60754, user = xxx
Not possible. When you saw "Number of record level warnings : 1" than it must be relevant message in the firebird.log
strange but possible. I renamed the firebird.log file to start a fresh one and now gfix writes into the log. I checked older DBs with this results:
Summary of validation errors Number of pointer page warnings : 986
log file: ...... DLO1S006 Wed May 18 06:51:40 2022 Database: C:\KARO3\DB\KARO.DB_12_5_AMABEND Warning: Pointer page 32545 {sequence 2} bits {0x08 secondary} are not consistent with data page 66869 {sequence 2349} state {0x09 full, secondary} in table MATERIAL (186)
DLO1S006 Wed May 18 06:51:40 2022 Database: C:\KARO3\DB\KARO.DB_12_5_AMABEND Warning: Pointer page 32545 {sequence 2} bits {0x05 full, swept} are not consistent with data page 603048 {sequence 2352} state {0x01 full} in table MATERIAL (186)
DLO1S006 Wed May 18 06:51:40 2022 Database: C:\KARO3\DB\KARO.DB_12_5_AMABEND Warning: Pointer page 32545 {sequence 2} bits {0x05 full, swept} are not consistent with data page 603050 {sequence 2354} state {0x01 full} in table MATERIAL (186)
DLO1S006 Wed May 18 06:51:40 2022 Database: C:\KARO3\DB\KARO.DB_12_5_AMABEND Warning: Pointer page 32545 {sequence 2} bits {0x05 full, swept} are not consistent with data page 603052 {sequence 2356} state {0x01 full} in table MATERIAL (186)
DLO1S006 Wed May 18 06:51:52 2022 Database: C:\KARO3\DB\KARO.DB_12_5_AMABEND Validation finished: 0 errors, 986 warnings, 986 fixed
and in another file Summary of validation errors Number of record level warnings : 1
DLO1S006 Wed May 18 06:55:46 2022 Database: C:\KARO3\DB\KARO_KAPUTT030322.DB Validation started
DLO1S006 Wed May 18 07:01:57 2022 Database: C:\KARO3\DB\KARO_KAPUTT030322.DB Warning: Relation has 40 orphan backversions {0 in use} in table MATERIAL (186)
DLO1S006 Wed May 18 07:02:11 2022 Database: C:\KARO3\DB\KARO_KAPUTT030322.DB Validation finished: 0 errors, 1 warnings, 0 fixed
A lot of warnings on pointer pages is not good. Validation fixed it all, btw. It happens sooner of all after crash (or kill) of server process. To be safe, I recommend to run validation immediately before the manual sweep. And keep an eye on validation results.
I have now started a check with gfix again and then carried out a manual sweep. The data is OK. However, there were pointer page warnings again. Always only in the table MATERIAL . What can be the reason for this? Is there anything special about this table? What can I learn from the log file entries?
DLO1S006 Fri May 20 22:21:15 2022
Database: karodb
Warning: Pointer page 505 {sequence 0} bits {0x05 full, swept} are not consistent with data page 66553 {sequence 513} state {0x01 full} in table MATERIAL (186)
Usually such inconsistencies related with not correct stop of database without flushing of page cache.
Check also forced writes setting on that db.
I'll re-check careful writes logic around swept
flag, probably we have a problem here.
In any case - I can't imagine reason for such inconsistency in normal\correct operational environment.
Of course it doesn't means we will not try to find and fix the possible bug.
This is the 2nd time I have lost data in a database. First in version 3.0.7 and after an update also in version 3.0.9 (gfix version WI-V3.0.9.33560 Firebird 3.0) (Windows x64 - version on Windows Server 2016 Datacenter in azure)
In a table, around 10 out of 15,000 data records suddenly disappeared. Since this table is in a master-detail relationship and the related detail records still exist, this shouldn't be possible.
Master table: MATERIAL with primary key MANUMM Detail table: MATLAGER with the foreign key MLAMANUMM
ALTER TABLE MATLAGER ADD FOREIGN KEY (MLAMANUMM) REFERENCES MATERIAL (MANUMM) ON DELETE CASCADE;
with the following query I found all lost records:In a correct database this query shouldn´t return any records if the foreign key is active.
I was able to repair the database again by adding the MATERIAL records again. The system did not crash, but it was noticeable that both data losses happened shortly after an automatic sweep (interval 20000). The sweeps in the log file ran without errors: Here are the entries from the log file:
.....
I have temporarily set the sweepinterval to 0 and will carry out the sweep manually in the future. Whats else can I do?