exaile / exaile

:notes: Cross-platform music player
https://www.exaile.org
GNU General Public License v2.0
367 stars 83 forks source link

Please move away from Berkeley DB #915

Open mgorny opened 9 months ago

mgorny commented 9 months ago

Berkeley DB is deprecated since Fedora 33 and will eventually be removed. Gentoo is also following suit. Please consider switching to another database backend.

Blinker73 commented 9 months ago

Please do. I would hate to loose native installation support of my favorite player inside my favorite distro

virtuald commented 9 months ago

Which database would you propose that we move to?

Blinker73 commented 9 months ago

I googled a bit and without knowing anything about how berkeley DB is used, i found this site that proposes alternatives https://alternativeto.net/software/berkeley-db/ I found that leveldb and qdbm are both present in Gentoo's package manager

virtuald commented 9 months ago

That's not a very compelling argument.

mgorny commented 9 months ago

Which database would you propose that we move to?

I think the closest option to Berkeley DB is GDBM. It's supported by Python out-of-the-box via dbm.gnu module (except on Windows).

The more modern and portable option would be sqlite3, also supported by Python out-of-the-box. You could also use SQLite3 via sqlalchemy, if you prefer an ORM API.

(Note that by "out of the box", I mean in default Python build. Technically you could build Python without them but I think that's rather rare and breaks other packages. On some distros the Python package may be split and the relevant extensions be installed separately.)

sjohannes commented 9 months ago

I've looked into some of these simple key-value databases before.

For the record, the latest Berkeley DB version (18.1) is released under GNU AGPL 3.0, which is compatible with the license of Exaile's codebase. As far as I know, it's also compatible with the licenses of our dependencies (Mutagen was GPL 2.0-only but they switched to GPL 2.0-or-later). It would make the whole Exaile+dependencies distribution AGPL 3.0 but I'm not aware of any legal issues.

I think ideally someone with clout in Linux distro circles should fork Berkeley DB 5.3 and declare the project complete. Then projects can simply link to that new library and be done with all this bikeshedding.

Otherwise, if Fedora does decide to remove Berkeley DB without an alternative (have they decided on this, or is it still just a proposal?), I think realistically the best way forward for us is to use SQLite with that stupid k-v wrapper I mentioned.

mgorny commented 9 months ago

I think ideally someone with clout in Linux distro circles should fork Berkeley DB 5.3 and declare the project complete.

It's not "complete", it's dead. It already requires patching to build at all and code rot will increase.

(have they decided on this, or is it still just a proposal?)

From what I understand, it's been decided but they didn't decide when it's going to happen.

virtuald commented 9 months ago

I think realistically the best way forward for us is to use SQLite with that stupid k-v wrapper I mentioned

I agree.

oz123 commented 3 months ago

LMDB seems like a much better choice compared to SQLite as a key value storage. LMBD does not need compaction or garbage collection phase at all, as the database use B+trees to store data and track where free pages are.

Also, the python binding are actively maintained and there are wheels for windows, linux and macosx on pypi. https://github.com/jnwatson/py-lmdb/ https://pypi.org/project/lmdb/#files

sjohannes commented 3 months ago

Thanks, that makes LMDB quite an attractive option. I don't know why I thought it needed offline compaction.

By the way, I tried the SQLite wrapper solution (basically the same as Python 3.13's dbm.sqlite3, which I only saw later) and ran into an issue: the Python sqlite3 module will outright refuse to execute queries if the current thread is different from the thread that opened the database. This works fine for music.db but not for lyrics.cache (which uses a long-running db connection and regulates multithread access using a lock).

oz123 commented 3 months ago

Another option you can try https://github.com/piskvorky/sqlitedict

I don't know how it performs compared to LmDb, but it's got a simple API, and you can practically vendor the file into exaile. That makes installation on Windows easy.

sjohannes commented 3 months ago

I've just tried LMDB and unfortunately there's a major issue with it. I think it's an LMDB bug but currently I don't have time to investigate further.

What happens is that when you open a database, LMDB requires a map_size argument which is the maximum size the db can grow to. The documentation says "On 64-bit there is no penalty for making this huge (say 1TB)." However, on Windows (MSYS2 build) it turns out the whole size is allocated on disk.

oz123 commented 3 months ago

Thanks again for looking at this. There is a known issue with it LMDB , but at the same time it might not be an issue. First ,

https://github.com/jnwatson/py-lmdb/issues/85#issuecomment-91938590

It is known for windows 32bit. Does it affect 64bit too? Second, it might be just a display issue and not the real disk size.

oz123 commented 1 day ago

Thanks, that makes LMDB quite an attractive option. I don't know why I thought it needed offline compaction.

By the way, I tried the SQLite wrapper solution (basically the same as Python 3.13's dbm.sqlite3, which I only saw later) and ran into an issue: the Python sqlite3 module will outright refuse to execute queries if the current thread is different from the thread that opened the database. This works fine for music.db but not for lyrics.cache (which uses a long-running db connection and regulates multithread access using a lock).

Depending on how you created the database connection you either get an exception or not.

See discussion here: https://ricardoanderegg.com/posts/python-sqlite-thread-safety/

Can you share your branch maybe I can help.

sjohannes commented 1 day ago

Thanks, that article is very helpful. I've made some more changes and pushed my current work to the sqlite branch in this repository. It now works correctly for lyrics.cache as well. It's currently missing tests, and the old db migration tests need to be adapted.

I'm still not completely sure about the impact of the sqlite3.threadsafety level (DB-API doc) on all this. My understanding is that we require level 1 at least, because at level 0 we need to have one lock for all SQLite calls, even the ones that use different connections. I think it's unlikely for any desktop distro to compile SQLite in single-thread mode (level 0) though.