ScottDuckworth / python-anyvcs

A Python abstraction layer for multiple version control systems
BSD 3-Clause "New" or "Revised" License
11 stars 4 forks source link

anydbm cache is not multi-process safe #27

Closed ScottDuckworth closed 10 years ago

ScottDuckworth commented 11 years ago

I'm finding so much information on how Berkeley DB scales so well in concurrent processes using reader/writer locks (see http://docs.oracle.com/cd/E17076_03/html/programmer_reference/cam.html for one), but I just guess that the Python dbm module doesn't pass the necessary flags. Grrr...damn Python!

I have found http://www.jcea.es/programacion/pybsddb.htm (referenced by http://docs.python.org/2/library/bsddb.html) which gives more direct access to the Berkeley DB C API. There might be some luck within it...

ScottDuckworth commented 11 years ago

Another thing to consider, which I had not considered before, is the license of the embedded database software.

Berkeley DB is released under a dual-license (http://www.oracle.com/technetwork/database/berkeleydb/downloads/licensing-098979.html). The open-source license essentially says that if you have a program which uses Berkeley DB and you redistribute it, the source code must be freely available to all. This is not an issue for python-anyvcs since it is LGPL, but non-free consumers of python-anyvcs which are redistributed would have to acquire a commercial license for Berkeley DB (and I've read that they can be pricey).

GNU dbm is GPL (http://savannah.gnu.org/projects/gdbm), which I believe is incompatible with LGPL.

SQLite is public domain (http://www.sqlite.org/copyright.html).

ScottDuckworth commented 11 years ago

Another possibility is to abandon the use of an embedded database all together by keeping the "database" in a directory tree, where keys are file names and values are the contents of the file. I wonder what the performance hit would be...

This would also allow for super-easy record-level locking - just lock the files!

ScottDuckworth commented 10 years ago

b233c34 removes all uses of anydbm and instead uses a custom file-based dictionary object which is multi-process safe and free of licensing concerns. The performance hit is almost non-existent, at least once the files are cached in the OS - I'm sure this is very dependent on the file system that is used.