Closed netvl closed 5 years ago
Sounds good! See also #2626 for some prior discussion.
I think the the solutions has two parts: first, the user of beets should have the ICU extension for SQLite on their machine (at least for Linux systems, where system SQLite libraries are usually used). This seems to be quite obscure, unfortunately - it seems that there is no ICU extension anywhere in Archlinux packages, including AUR, which suggests than it is even worse in other distros (maybe sans Gentoo and similar distributions). All instructions on the net suggest building the extension manually.
Second, beets should load this extension when it queries the database. From what I understand, this should be an explicit action on the client side of the database. After it is done, the LIKE
queries should handle character case automatically.
I wonder, could this (loading an SQLite extension) be done through a plugin? If yes, then I think adding a plugin with an optional explicit path to the extension shared object (possibly using some system directories, if it is not specified), which loads this extension, could be a nice solution.
Building sqlite with ICU module enabled does indeed help, and makes it work automatically without any changes necessary from the beets side. This kind of solves this issue specifically for me, since I can build sqlite with necessary options. But I wonder - across all platforms, how often maintainers build sqlite with ICU extension built-in?
That's great to hear @netvl, thanks for doing some digging! It's awesome to hear that beets doesn't need any changes to support this, and I've been playing around with using a dynamic library for it too.
It does seem that we can load the ICU extension into beets' SQLite database at startup using a plugin, although we do need to reach into the beets internals a bit;
from __future__ import division, absolute_import, print_function
from beets.plugins import BeetsPlugin
class LoadIcuPlugin(BeetsPlugin):
def __init__(self):
super(LoadIcuPlugin, self).__init__()
self.register_listener('library_opened', self.library_opened)
def library_opened(self, lib):
lib._connection().load_extension('libicu.so')
It seems that we could also call db.enable_load_extension(True)
inside dbcore
and then just use the Transaction#query
method to run SELECT load_extension("libicu.so")
, which doesn't require calling any private methods (like _connection()
).
I think this could be made into a simple LoadExtPlugin
which takes a list of plugins to load from your beets configuration.
That's cool! We could even fold this into beets core if we could make it optional. Is it easy to load the plugin if it's available, and silently do nothing if it's not?
The main issue is that you need a path to the plugin, I guess we could make it a core configuration option to provide the path? I'm not sure if there's a "standardised" path for SQLite plugins.
Ah, of course, that makes sense! In that case, a beets plugin for loading these could be the right path.
Even with ICU enabled, there are still edge cases that won’t work, I find many with French accented letters: The song "À quoi ça sert l'amour (avec Théo Sarapo)" by "Édith Piaf" cannot be found using these:
beet ls edith piaf
beet ls Edith piaf
beet ls a quoi ca
but will be found using these:
beet ls édith piaf
beet ls Édith piaf
beet ls a quoi ça
beet ls A QUOI ÇA
It appears that an accented À
is handled differently from an accented É
, ditto for ç
.
This is not a beets problem but a problem with SQLite’s handling of ICU, LIKE and level 1 collation.
Problem
When searching for metadata written in English or, in other words, in latin script,
beet ls
performs case-insensitive search, which is very nice. With non-latin scripts (I've checked with cyrillic, since it is the only one in which I have tracks and which has the case concept), however, case-insensitive search does not work.Running this command:
outputs nothing, while this one:
shows all the tagged files correctly.
I would expect this to work for any kind of scripts, probably based on unicode case transformations.
Setup
My configuration (output of
beet config
) is: