Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

RE: The DB_File locking method is flawed #601

Closed p5pRT closed 21 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#1460 (status was 'resolved')

Searchable as RT1460$

p5pRT commented 24 years ago

From David

Hi David\,

thanks for creating an *excellent* test harness to illustrate a nasty flaw in the recommended method of locking DB_File databases. I had no problem reproducing it here.

I agree with your analysis of why things aren't working as expected with the current DB_File locking technique. The Berkeley DB cache is certainly the root of the problem. Looks like the locking section in DB_File.pm needs rewritten.

WRT your DB_Wrap module\, I need to have a play with it to see what its limitations are.

I've cc'ed this to perl5-porters to widen the discussion on the merits of DB_Wrap. For the benefit of p5p\, DB_Wrap creates an external lock file for the database (rather than locking the database itself) to control access.

cheers Paul

Hello\,

forwarding this on to you... Paul because you are the DB_File guy and Stas because you asked me to CC you

This was just posted in comp.lang.perl.misc a few min ago.

- David Harris Principal Engineer\, DRH Internet Services

David Harris \dharris@​drh\.net wrote in message news​:\CISD3\.1555$zI3\.41304@​iad\-read\.news\.verio\.net... Hi\,

I've been doing some work with the DB_File module and mod_perl and I've found a bug in the "commonly accepted" procedure for locking a database file. This procedure is outlined in the POD documentation for the DB_File module and in the Programming Perl book where it covers the DB_File module and other examples and guides. The bug can lead to database corruption.

The example locking method goes like this​:

$db = tie\(%db\, 'DB\_File'\, '/tmp/foo\.db'\, O\_CREAT|O\_RDWR\, 0644\)
    || die "dbcreat /tmp/foo\.db $\!";
$fd = $db\->fd;
open\(DB\_FH\, "\+\<&=$fd"\) || die "dup $\!";
flock \(DB\_FH\, LOCK\_SH\) || die "flock&#8203;: $\!";

The problem is that the database file is opened and then later locked -- and when the database is opened the first 4k (in my dbm library) are read and then cached in memory. Therefore\, a process can open the database file and cache the first 4k\, then block in the flock while another process modifies the first 4k of the file. When the original process gets the lock is now has a inconsistent view of the database\, and if it writes using this inconstant view\, it may corrupt the database on disk.

This does not cause corruption every time a process has to block in the flock call\, because one can do quite a bit of writing to the database file without actually changing the first 4k of the file.

I first saw this problem when I investigated using strace to see a listing of all the system calls. I have verified that the tie line not only produces an open system call\, but also a read system call on the database file.

To be sure of the problem\, I've created an example program shows database corruption on disk when using this locking method. It simply forks off two processes. The first gets the lock and writes a bunch of records to the database\, while the second blocks in the flock call. Then the first process finishes and the second writes a bunch of records. When the second process is done\, most of the records written by the first process are no longer in the database. Stas Beckman (author of the mod_perl guide) has looked at the program and verified that he is also seeing corruption.

My example program\, along with some documentation and the system call trace are available in this archive​:

http​://www.davideous.com/misc/dblockflaw-1.2.tar.gz http​://www.davideous.com/misc/dblockflaw-1.2/

I believe that the correct fix for this problem is to simply gain a lock before letting DB_File touch the database file. For my own work\, I've created a module to this effect called DRH​::DB_Wrap which wraps the DB_File module and adds one argument to the tie command\, which specifies what locking should be done\, "read" or "write". IHMO\, this is a nice solution for the locking problem\, and I'm wondering if it would be useful to others. I'd be glad to formally package it up (of course removing my internal "DRH​::" prefix) if people think this would be useful.

You can get a copy of this DRH​::DB_Wrap module at​:

http​://www.davideous.com/misc/DB_Wrap.pm

I'm looking forward to seeing this get fixed.

- David Harris Principal Engineer\, DRH Internet Services

p5pRT commented 21 years ago

@iabyn - Status changed from 'stalled' to 'resolved'