d4g33z / ruffus

Automatically exported from code.google.com/p/ruffus
MIT License
0 stars 0 forks source link

.ruffus_history.sqlite locked on shared network drive #59

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I tried out the latest (97518ac8f0e0) version of ruffus and came across a 
problem with the sqlite history file (traceback below).
Our system makes uses the Lustre filesystem and creating sqlite databases on 
this filesystem doesn't work. I've read other reports of people having the same 
issue on shared filesystems. 
It may be due to this issue: http://beets.radbox.org/blog/sqlite-nightmare.html
Creating a database file in /tmp or ~/ works though, so that might be an 
alternative.

Original exception:

    Exception #1
      'sqlite3.OperationalError(database is locked)' raised in ...
       Task = def gene_dists(...):

    Traceback (most recent call last):
      File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 1130, in signal
        needs_update, msg = self.needs_update_func (*param, task=self)
      File "build/bdist.linux-x86_64/egg/ruffus/file_name_parameters.py", line 442, in needs
_update_check_modify_time
        job_history = dbdict.open(RUFFUS_HISTORY_FILE, picklevalues=True)
      File "build/bdist.linux-x86_64/egg/ruffus/dbdict.py", line 330, in open
        return DbDict(filename, picklevalues)
      File "build/bdist.linux-x86_64/egg/ruffus/dbdict.py", line 110, in __init__
        self._create_table()
      File "build/bdist.linux-x86_64/egg/ruffus/dbdict.py", line 123, in _create_table
        self.con.execute('create table data (key PRIMARY KEY,value)')
    OperationalError: database is locked

Original issue reported on code.google.com by tyler.fu...@gmail.com on 22 Nov 2013 at 3:07

GoogleCodeExporter commented 9 years ago
Will see if have a single sqlite connection per pipeline_run / 
pipeline_printout helps...

Original comment by bunbu...@gmail.com on 23 Nov 2013 at 1:51

GoogleCodeExporter commented 9 years ago
These are the latest changes to Ruffus which should hopefully fix this problem:

1) Only a single connection is made to the sqlite database file at a time (for 
any run of the pipeline) so file-locking problems on network file systems such 
as Lustre should hopefully be ameliorated.

2) The history file used for pipeline_run, pipeline_printout and 
pipeline_printout_graph can be set to another location, e.g. on a local drive:
    pipeline_run(.., history_file = "XXX", ...)

(Only using the temp drive sorts of defeats the whole purpose of recording 
which files have run successfully in the pipeline.)

3) The default history file location can be set in 
ruffus.ruffus_utility.RUFFUS_HISTORY_FILE

4) The default history file location can be overridden by the environmental 
varible DEFAULT_RUFFUS_HISTORY_FILE

5) The default history file location can use path expansion to automatically 
give each script its own independent history file. This is the safest and 
easiest alternative to using a history file in the local directory.

So if the environment variable is:
    export DEFAULT_RUFFUS_HISTORY_FILE=.{basename}.ruffus_history.sqlite
Then the job history database for "run.me.py" will be 
".run.me.ruffus_history.sqlite"

All the scripts can be set to a single directory by using:
    export DEFAULT_RUFFUS_HISTORY_FILE=/your/path/.{basename}.ruffus_hist.sqlite

If you are really paranoid about name clashes, you can use:
    export DEFAULT_RUFFUS_HISTORY_FILE=/your/path/{path}/.{basename}.sqlite

In which case, the history file for "/test/bin/scripts/run.me.py" will be: 
    /your/path/test/bin/scripts/.run.me.sqlite

Original comment by bunbu...@gmail.com on 16 Dec 2013 at 6:33