h2 can't recover database from a crash

GoogleCodeExporter commented 9 years ago

We are testing our large database, and if the h2 server is shutdown 
incorrectly, it never seems to be able to reload the database the next time it 
starts up. 

Here is a sample error we get:

General error: "java.lang.RuntimeException: page[246939] data leaf table:207 
T207 entries:74 parent:249485 keys:[1287663, 1287664, 1287665, 1287666, 
1287667, 1287668, 1287669, 1287670, 1287671, 1287672, 1287673, 1287674, 
1287675, 1287676, 1287677, 1287678, 1287679, 1287680, 1287681, 1287682, 
1287683, 1287684, 1287685, 1287686, 1287687, 1287688, 1287689, 1287690, 
1287691, 1287692, 1287693, 1287694, 1287695, 1287696, 1287697, 1287698, 
1287699, 1287700, 1287701, 1287702, 1287703, 1287704, 1287705, 1287706, 
1287707, 1287708, 1287709, 1287710, 1287711, 1287712, 1287713, 1287714, 
1287715, 1287716, 1287717, 1287718, 1287719, 1287720, 1287721, 1287722, 
1287723, 1287724, 1287725, 1287726, 1287727, 1287728, 1287729, 1287730, 
1287731, 1287732, 1287733, 1287734, 1287735, 1287736] offsets:[16222, 16090, 
15946, 15793, 15661, 15517, 15365, 15195, 15055, 14923, 14761, 14629, 14481, 
14345, 14221, 14069, 13924, 13788, 13668, 13512, 13356, 13224, 13071, 12951, 
12819, 12695, 12550, 12406, 12276, 12120, 11976, 11832, 11725, 11593, 11457, 
11317, 11176, 11020, 10867, 10735, 10595, 10449, 10309, 10177, 10024, 9888, 
9764, 9632, 9480, 9324, 9178, 9027, 8887, 8731, 8591, 8459, 8327, 8171, 8043, 
7903, 7739, 7607, 7475, 7343, 7195, 7031, 6907, 6759, 6627, 6487, 6355, 6202, 
6046, 5906] parent 249485 expected 245826" [50000-175] HY000/50000

Original issue reported on code.google.com by localde...@gmail.com on 3 Jul 2014 at 3:58

GoogleCodeExporter commented 9 years ago

Hi,

This looks like a corrupt database. To recover the data, use the tool 
org.h2.tools.Recover to create the SQL script file, and then re-create the 
database using this script. Does it work when you do this?

With version 1.3.171 and older: when using local temporary tables and not 
dropping them manually before closing the session, and then killing the process 
could result in a database that couldn't be opened.

With version 1.3.162 and older: on out of disk space, the database can get 
corrupt sometimes, if later write operations succeed. The same problem happens 
on other kinds of I/O exceptions (where one or some of the writes fail, but 
subsequent writes succeed). Now the file is closed on the first unsuccessful 
write operation, so that later requests fail consistently.

Important corruption problems were fixed in version 1.2.135 and version 1.2.140 
(see the change log). Known causes for corrupt databases are: if the database 
was created or used with a version older than 1.2.135, and the process was 
killed while the database was closing or writing a checkpoint. Using the 
transaction isolation level READ_UNCOMMITTED (LOCK_MODE 0) while at the same 
time using multiple connections. Disabling database file protection using 
(setting FILE_LOCK to NO in the database URL). Some other areas that are not 
fully tested are: Platforms other than Windows XP, Linux, Mac OS X, or JVMs 
other than Sun 1.5 or 1.6; the feature MULTI_THREADED; the features AUTO_SERVER 
and AUTO_RECONNECT; the file locking method 'Serialized'.

I am very interested in analyzing and solving this problem. Corruption problems 
have top priority for me. I have a few questions:

- What is your database URL?
- Did you use LOG=0 or LOG=1? Did you read the FAQ about it?
- Did the system ever run out of disk space?
- Could you send the full stack trace of the exception including message text?
- Did you use SHUTDOWN DEFRAG or the database setting DEFRAG_ALWAYS with H2 
version 1.3.159 or older?
- How many connections does your application use concurrently?
- Do you use temporary tables?
- With which version of H2 was this database created?
    You can find it out using:
    select * from information_schema.settings where name='CREATE_BUILD'
    or have a look in the SQL script created by the recover tool.
- Did the application run out of memory (once, or multiple times)?
- Do you use any settings or special features (for example cache settings,
    two phase commit, linked tables)?
- Do you use any H2-specific system properties?
- Is the application multi-threaded?
- What operating system, file system, and virtual machine
    (java -version) do you use?
- How did you start the Java process (java -Xmx... and so on)?
- Is it (or was it at some point) a networked file system?
- How big is the database (file sizes)?
- How much heap memory does the Java process have?
- Is the database usually closed normally, or is process terminated
    forcefully or the computer switched off?
- Is it possible to reproduce this problem using a fresh database
    (sometimes, or always)?
- Are there any other exceptions (maybe in the .trace.db file)?
    Could you send them please?
- Do you still have any .trace.db files, and if yes could you send them?
- Could you send the .h2.db file where this exception occurs?

Regards,
Thomas

Original comment by thomas.t...@gmail.com on 4 Jul 2014 at 7:29

GoogleCodeExporter commented 9 years ago

- Database url = 
jdbc:h2:tcp://localhost/XXXX;PAGE_SIZE=16384;MAX_QUERY_TIMEOUT=300000;DB_CLOSE_D
ELAY=-1
- I haven't modified LOG
- System has many gigs free yet
- I rarely issue a shutdown, when I do it's just "SHUTDOWN". Haven't touched 
DEFRAG_ALWAYS setting
- Connections vary depending on load; we're still mainly in testing phase, load 
is minimal at times, < 10/15 connections
- No temporary tables
- Create_build = 175
- H2 is using < 1/2 available ram (I set java -Xmx = 200 gigs)
- No special features or system properties
- Applications are heavily multithreaded
- Currently testing on native hardware, windows 2008 r2, java 7
- Data is on a SSD drive, never networked
- Database is currently 20 gigs, it was smaller and larger at different times, 
it has failed to recover on a 10 gig file so I don't think it's strictly size 
related.
- Database is never supposed to be closed, it runs until windows reboots. A 
piece of flaky hardware is occasionally causing a BSOD (still investigating 
that) which is why H2 isn't getting shutdown cleaning using "shutdown".

I will try and send some trace / db files if I catch the error and have a 
second. What we're doing right now is to force a shutdown of the server, 
manually copy the 2 files over to a backup folder, then restart the server. 
This is just a work around so we can keep on moving, I haven't attempted to 
load a non-clean-shutdown database again since it is failing and we're always 
pushed for time; we just restore from our last known good shutdown / backup.

Original comment by localde...@gmail.com on 8 Jul 2014 at 7:00

GoogleCodeExporter commented 9 years ago

I'm just wondering if the bsod is related to the corrupted database. If the 
drive we are storing the h2 files on is "on the way out", could it be causing 
the corrupted h2 database as well as giving us a bsod all at once? What could I 
do in H2 to verify this, maybe store the log file on a different drive or 
something?

Original comment by localde...@gmail.com on 8 Jul 2014 at 8:02

GoogleCodeExporter commented 9 years ago

> Database is never supposed to be closed, it runs until windows reboots.

In theory, this _should_ work with version the PageStore file format (version 
1.3.x). There are quite many tests that should make sure this is supported. 
However: in the past there were problems. The main reason for those problems is 
the complexity of the file format.

Version 1.4.x (the MVStore file format) is much simpler and more robust in this 
respect. I understand version 1.4 is still beta, but I think it's worth a try. 
There are still a few known problems (for example the database file is not as 
small yet as with version 1.3), but probably the don't affect you.

Original comment by thomas.t...@gmail.com on 13 Jul 2014 at 10:09

feristhia / h2database

h2 can't recover database from a crash #570