Open mgorny opened 9 months ago
Please do. I would hate to loose native installation support of my favorite player inside my favorite distro
Which database would you propose that we move to?
I googled a bit and without knowing anything about how berkeley DB is used, i found this site that proposes alternatives https://alternativeto.net/software/berkeley-db/ I found that leveldb and qdbm are both present in Gentoo's package manager
That's not a very compelling argument.
Which database would you propose that we move to?
I think the closest option to Berkeley DB is GDBM. It's supported by Python out-of-the-box via dbm.gnu
module (except on Windows).
The more modern and portable option would be sqlite3
, also supported by Python out-of-the-box. You could also use SQLite3 via sqlalchemy
, if you prefer an ORM API.
(Note that by "out of the box", I mean in default Python build. Technically you could build Python without them but I think that's rather rare and breaks other packages. On some distros the Python package may be split and the relevant extensions be installed separately.)
I've looked into some of these simple key-value databases before.
Berkeley DB. What we use now. Works everywhere. Most OSes build the bindings against version 5.3 (before the switch to AGPL), which AFAIK doesn't have any breaking issues. Arch doesn't do this but they build against 6.2, which was already several years outdated at the time—I don't know if there was a particular reason.
gdbm & ndbm. Bindings maintained by the Python project. Doesn't work on Windows. MSYS2 used to include it (not anymore) and during Exaile 4.0 beta testing we found out that it was broken there as well. This was the point where we decided to exclusively use bsddb/berkeleydb.
Python's dumbdb. Works everywhere. "Intended as a last resort fallback ... not written for speed ... not nearly as heavily used ..." basically not very encouraging words from the developers themselves.
LMDB. Bindings used to be an issue but now seems widely available (checked on Debian, Fedora, Arch, and MSYS2). If I remember correctly, LMDB requires manual garbage collection (by copying the data to a new database) because it doesn't reuse space from removed items.
LevelDB. Bindings packaged on Debian but not Fedora or MSYS2, as far as I can find.
SQLite. Well, it's not very simple and not a k-v database, but we could create a simple wrapper that just stores everything in one table with two columns. Bindings maintained by Python project and available everywhere.
For the record, the latest Berkeley DB version (18.1) is released under GNU AGPL 3.0, which is compatible with the license of Exaile's codebase. As far as I know, it's also compatible with the licenses of our dependencies (Mutagen was GPL 2.0-only but they switched to GPL 2.0-or-later). It would make the whole Exaile+dependencies distribution AGPL 3.0 but I'm not aware of any legal issues.
I think ideally someone with clout in Linux distro circles should fork Berkeley DB 5.3 and declare the project complete. Then projects can simply link to that new library and be done with all this bikeshedding.
Otherwise, if Fedora does decide to remove Berkeley DB without an alternative (have they decided on this, or is it still just a proposal?), I think realistically the best way forward for us is to use SQLite with that stupid k-v wrapper I mentioned.
I think ideally someone with clout in Linux distro circles should fork Berkeley DB 5.3 and declare the project complete.
It's not "complete", it's dead. It already requires patching to build at all and code rot will increase.
(have they decided on this, or is it still just a proposal?)
From what I understand, it's been decided but they didn't decide when it's going to happen.
I think realistically the best way forward for us is to use SQLite with that stupid k-v wrapper I mentioned
I agree.
LMDB seems like a much better choice compared to SQLite as a key value storage. LMBD does not need compaction or garbage collection phase at all, as the database use B+trees to store data and track where free pages are.
Also, the python binding are actively maintained and there are wheels for windows, linux and macosx on pypi. https://github.com/jnwatson/py-lmdb/ https://pypi.org/project/lmdb/#files
Thanks, that makes LMDB quite an attractive option. I don't know why I thought it needed offline compaction.
By the way, I tried the SQLite wrapper solution (basically the same as Python 3.13's dbm.sqlite3, which I only saw later) and ran into an issue: the Python sqlite3 module will outright refuse to execute queries if the current thread is different from the thread that opened the database. This works fine for music.db but not for lyrics.cache (which uses a long-running db connection and regulates multithread access using a lock).
Another option you can try https://github.com/piskvorky/sqlitedict
I don't know how it performs compared to LmDb, but it's got a simple API, and you can practically vendor the file into exaile. That makes installation on Windows easy.
I've just tried LMDB and unfortunately there's a major issue with it. I think it's an LMDB bug but currently I don't have time to investigate further.
What happens is that when you open a database, LMDB requires a map_size argument which is the maximum size the db can grow to. The documentation says "On 64-bit there is no penalty for making this huge (say 1TB)." However, on Windows (MSYS2 build) it turns out the whole size is allocated on disk.
Thanks again for looking at this. There is a known issue with it LMDB , but at the same time it might not be an issue. First ,
https://github.com/jnwatson/py-lmdb/issues/85#issuecomment-91938590
It is known for windows 32bit. Does it affect 64bit too? Second, it might be just a display issue and not the real disk size.
Thanks, that makes LMDB quite an attractive option. I don't know why I thought it needed offline compaction.
By the way, I tried the SQLite wrapper solution (basically the same as Python 3.13's dbm.sqlite3, which I only saw later) and ran into an issue: the Python sqlite3 module will outright refuse to execute queries if the current thread is different from the thread that opened the database. This works fine for music.db but not for lyrics.cache (which uses a long-running db connection and regulates multithread access using a lock).
Depending on how you created the database connection you either get an exception or not.
See discussion here: https://ricardoanderegg.com/posts/python-sqlite-thread-safety/
Can you share your branch maybe I can help.
Thanks, that article is very helpful. I've made some more changes and pushed my current work to the sqlite
branch in this repository. It now works correctly for lyrics.cache as well. It's currently missing tests, and the old db migration tests need to be adapted.
I'm still not completely sure about the impact of the sqlite3.threadsafety level (DB-API doc) on all this. My understanding is that we require level 1 at least, because at level 0 we need to have one lock for all SQLite calls, even the ones that use different connections. I think it's unlikely for any desktop distro to compile SQLite in single-thread mode (level 0) though.
Berkeley DB is deprecated since Fedora 33 and will eventually be removed. Gentoo is also following suit. Please consider switching to another database backend.