byzhang / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Possible bug: Missing a fsync() or msync() call after creating MANIFEST-000001 #183

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This is about the scenario where a power crash happens while a database is 
being created. The bug is triggered only if the crash happens within a narrow 
time interval, and only when certain filesystems (eg: ext4) are used. 
Furthermore, the bug does not actually corrupt any data, instead only reporting 
an IO-error on a being-created (i.e., empty) database.

So I'm not sure this behavior is "wrong". Please ignore if you already knew 
about this behavior.

What steps will reproduce the problem?

1. Use a Fedora/Ubuntu machine. Create a new leveldb database in a ext4 
partition that no other process is writing to.

2. During the creation, use some trick to crash the machine soon after the 
mysnc() corresponding to "MANIFEST-000002" happens. Specifically, [A] the crash 
should happen before any sync-like call after rename("000002.dbtmp"), and [B] 
the crash should happen within around few seconds. A probable trick would be to 
add a sleep() after msync("MANIFEST-000002"), and manually pull the plug as 
soon as the sleep() is triggered.

3. Reboot the machine. The database would now have a CURRENT file pointing to 
MANIFEST-000001, but MANIFEST-0000001 would be empty (behavior would be 
slightly different depending on the filesystem). Try opening the database again 
with leveldb.

What is the expected output? What do you see instead?
Leveldb would report an IO Error. It is expected to just open the (empty) 
database and continue working.

What version of the product are you using? On what operating system?
Leveldb-1.12.0. I used Ubuntu 12.04, although most Linux OSes should behave the 
same way. Also, in addition to ext4, I suspect other filesystems also behave 
the same way.

Please provide any additional information below.
I'm more involved in filesystem research than in using leveldb, so I might be 
totally wrong. Do let me know if any additional tests would be useful from my 
side, I will be happy to help.

Original issue reported on code.google.com by madthanu@gmail.com on 30 Jun 2013 at 9:26

GoogleCodeExporter commented 9 years ago
Ding? Is the leveldb community interested in bugs like these at all?

I think I have discovered a couple of of similar bugs, but would it be useful 
to you guys to have new issues created?

Original comment by madthanu@gmail.com on 15 Jul 2013 at 12:57

GoogleCodeExporter commented 9 years ago
Sorry for not responding. There aren't too many people spending a significant 
amount of time on leveldb, so bugs are on the back burner except for urgent 
things like corruption or crashes. Furthermore, this particular bug might be 
affected by some directory syncing work that is in progress.

I suspect that bugs like this one will probably also get looked at some point 
when somebody has some time.  So it would be helpful if you have other similar 
things you can point out.

Thanks.

Original comment by san...@google.com on 15 Jul 2013 at 11:29

GoogleCodeExporter commented 9 years ago
Sure! I'll report all the potential bugs.

Original comment by madthanu@gmail.com on 15 Jul 2013 at 11:37