dagolden / Path-Tiny

File path utility
41 stars 59 forks source link

edit_lines() uses shared locking (LOCK_SH) #215

Closed randstring closed 6 years ago

randstring commented 6 years ago

edit_lines() opens the input file, by specifying '<' mode, which is fine for what it does, but makes filehandle() apply only shared locking. Thus, a competing process could also enter edit_lines() on the same file at the same time, which is unsatisfying behaviour for me.

Perhaps it could be changed to '+<', which would be more reasonable?

xdg commented 6 years ago

I'd like to be careful we avoid an XY Problem. Could you elaborate more on your use case and what about the current behavior is unsatisfying?

randstring commented 6 years ago

Thanks for looking into this.

I have a small library, which servers as an interface to manipulate a text config file. It supports line-based update, remove & insert operations. Any access to the config will pass through this library. It will be used by a number of different scripts, some of which have a high chance of running simultaneously.

Now suppose I have process A, which performs a remove operation. The process calls edit_lines() with a callback, which removes some lines by setting them to ''. The rest are written to the temporary file, which is finally moved over the original. Meanwhile, process B decides to replace few lines and calls edit_lines() on the same file. It has no trouble obtaining a second shared lock and the operation is performed, finally moving the second temporary file over the original. Since the second edit_lines() is called before the first one overwrites the file, it will see the initial contents and will restore the deleted lines back to the file upon completion. Any updates by the first process will be completely lost.

Now, I could work around this by using some external locking mechanism, but this negates any convenience I get from the edit_lines() method. Having a write lock, when you intend to modify a file is only natural and will allow the second process to simply block until the first one is complete and then get a fresh and current content.

xdg commented 6 years ago

I think there's a race condition, even with exclusive locking, because process A has to drop the lock in order to atomically rename the temp file over the original. In that moment, process B can get a lock on the file and read it.

If you really want to hold a write lock on the file for the duration of the writes, you need to edit in-place, not via edit_lines, which is only really intended as a in-process equivalent of perl -pi.

For example, you could use openrw, read the file, seek to the beginning, truncate, then write. Not as trivial as edit_lines, but guarantees control for the full write duration.

randstring commented 6 years ago

I believe the second race condition could be solved by swapping the order of the close and move statements, so that the rename happens before the descriptor is closed and the lock freed. This should work on Linux and some Unix, but I'm not sure about all supported platforms. This leaves us with yet another race condition, when you might happen to open the file and wait for an exclusive lock, only to find that the file is already replaced and you've got exclusive access to a to-be-deleted inode. It also has a solution, as suggested here I'd rather not update the file in place, to avoid having it in an inconsistent state at any moment.

While I understand that you find edit_lines() unsuitable for this use case, perhaps a configuration option or another method could be introduced into the module, which would do this?

xdg commented 6 years ago

Unfortunately, it's not portable to reverse the order. I appreciate your interest, but for your use case, I think you'll need to write a bit of extra code yourself outside Path::Tiny.

ap commented 6 years ago

One of IO::AtomicFile, File::AtomicWrite or IO::File::AtomicChange might be useful to reduce the amount of code you have to write. None of them handle locking for you, though.