Closed ryanakca closed 1 year ago
At least in openSUSE, sqlite has json support by default. Do you know if that's a reasonable dependency that can be added to beets? If so, I'll try to use change my PR to use json values. BTW, would it make sense to add a database scheme versioning? I'm thinking it would be interesting if we don't mix regular strings values with json values in a given column, so some code to "migrate" the column values would be needed (that is, in database format v1, genre, etc store strings, while in v2, json values are stored).
Just for the records:
sqlite> CREATE TABLE songs (id INTEGER PRIMARY KEY ASC AUTOINCREMENT, genre TEXT );
sqlite> insert into songs (genre) VALUES (json_array('rock', 'blues'));
sqlite> insert into songs (genre) VALUES (json_array('pop'));
sqlite> select songs.id, json_genre.value from songs, json_each(songs.genre) as json_genre where json_genre.value = 'rock';
id|value
1|rock
sqlite> select songs.id, json_genre.value from songs, json_each(songs.genre) as json_genre where json_genre.value like '%o%';
id|value
1|rock
2|pop
So this looks promising :)
Good question—I don't know how pervasive that version of SQLite is. That would require some careful searching into the state of various OSes. On the other hand, nothing stops us from storing JSON in our database even if we can't use the fancy queries over it yet.
I suppose you're right that the nice thing about the null-separated encoding is that there is no migration required at all. A single-element list is exactly the same as a plain string.
About versioning: so far, we've been able to avoid it. The schema benefits from being so incredibly simple that we never really need to grapple with complicated migrations. We might need to explore it now, of course!
I don't know how pervasive that version of SQLite is. That would require some careful searching into the state of various OSes
As I'm very interested in this feature, I did some research on this:
Debian (and probably Ubuntu): enabled in stretch since 2015-10-19
disabled in jessie and older
Arch : enabled since 2016-01-19
Fedora : enabled since 2016-01-22
CentOS : disabled
OpenSUSE: enabĺed
FreeBSD : disabled by default, but simple to enable
Windows : enabled
Mac OS X: disabled according to https://docs.python.org/3/library/sqlite3.html#f1
The biggest barrier is probably Mac OS X.
Hmm; that is tricky. As usual, macOS is behind.
Maybe we can architect this feature so it degrades gracefully on systems without the appropriate SQLite feature?
While you guys thinking of a some generic solution I came up with the following hack of mediafile.py
class ListMediaField(MediaField):
"""Property descriptor that retrieves a list of multiple values from
a tag.
Uses ``get_list`` and set_list`` methods of its ``StorageStyle``
strategies to do the actual work.
"""
def __get__(self, mediafile, _):
values = []
for style in self.styles(mediafile.mgfile):
values.extend(style.get_list(mediafile.mgfile))
return [_safe_cast(self.out_type, value) for value in values]
def __set__(self, mediafile, values):
for style in self.styles(mediafile.mgfile):
style.set_list(mediafile.mgfile, values)
def single_field(self):
"""Returns a ``MediaField`` descriptor that gets and sets the
first item.
"""
options = {'out_type': self.out_type}
return ListMediaSingleField(*self._styles, **options)
class ListMediaSingleField(MediaField):
"""Property descriptor that retrieves a list of multiple values from
a tag.
Uses ``get_list`` and set_list`` methods of its ``StorageStyle``
strategies to do the actual work.
"""
def __get__(self, mediafile, _):
values = []
for style in self.styles(mediafile.mgfile):
values.extend(style.get_list(mediafile.mgfile))
return "; ".join([_safe_cast(self.out_type, value) for value in values])
def __set__(self, mediafile, value):
values = value.split("; ")
for style in self.styles(mediafile.mgfile):
style.set_list(mediafile.mgfile, values)
Combined with
lastgenre:
canonical: yes
count: 5
separator: '; '
Works pretty good to me. At least for mp3s and flacs. I know that's not a production quality solution, but it just works in my case: beets can both read and write multiple genre tags.
I was tinkering with the join table for flexible attributes and lists (i guess this is kind of horizontal partitioning). I don't know how much "magic" it would need for creating the query from user input + "metadata" table, like in CMSs which you can customize the fields. The idea would be like this:
CREATE TABLE songs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT
-- ...default fields
);
-- list type
CREATE TABLE songs_artists (
id INTEGER,
ord INT DEFAULT 1,
name TEXT NOT NULL,
joiner TEXT, -- feat
PRIMARY KEY (id, ord),
FOREIGN KEY (id) REFERENCES songs(id),
)
# example
fields = OrderedDict({
'title': 'songs.title',
'artists': 'GROUP_CONCAT(songs_artists.joiner || songs_artists.name, "")'
})
joins = [
'JOIN songs_artists on songs.id = songs_artists.id
]
filter_subselect = f"SELECT id from songs_artists where {criteria}"
sql = f"SELECT {','.join(fields.values())} FROM songs {joins} WHERE songs.id IN (filter_subselect)"
# fetch and create objects / DTOs etc
Sure! This seems like about the right design to make arbitrary files into lists. I'd be interested to investigate what the potential performance trade-offs are…
Maybe we can architect this feature so it degrades gracefully on systems without the appropriate SQLite feature?
For PostgreSQL I've seen people recommend converting dbs with JSON fields to regular string fields and rely on downstream software to decode the JSON when dealing with pgsql versions prior to native JSON support. Maybe beets could also just use json
if we can't get/store "real" JSON objects from native SQLite?
(This also seems compatible with the json1
extension: "The json1 extension (currently) stores JSON as ordinary text."[1])
@Freso that'd mean we'd need two versions of our queries to degrade nicely for versions of SQLite without json1
support, as the main advantage to using the JSON column type are the functions supported by the SQLite json1
extension.
@jackwilsdon I think there are some other great advantages, e.g., that you can store lists of non-string objects (incl. other lists!), which is not possible with the "CSV" approach (regardless of what we decide our "C" to be). (We haven't found a use case for these though, but it's nice to go down a path that is known to be easily extendable.)
I do agree that it's a bother, and I wouldn't mind if it was decided to be a beets 2.0.0 release feature that simply has a hard requirement for json1
SQLite support. Older distributions and builds that don't have it enabled would then have to use beets <2, which is also working fine for most cases. (Mostly I was just commenting to hopefully get the issue moving along a bit. (Five years ago, @sampsyo said "We should indeed get moving on list-valued fields."…) :))
What's wrong with just using the standard relational DB option of a join table? Losing indexing and query support seems like a poor trade-off for storing simple tabular data as JSON.
@Freso I'm certainly not against storing it as JSON, my point was mainly around we the fact that we wouldn't be able to use any advantages of json1
until we deprecated support for non-json1
versions of SQLite, which is less than ideal.
I guess one thing to look at is % of installations that support json1
— we'd probably have to go digging in different distros or maybe outsource some of the statistics to the community if they can't be found elsewhere.
That's great news; thanks for the heads up!
Could something like this python package, supersqlite, be used to bridge the gap until the features we need for json support are officially added?
Since python 3.9.3 is now officially released, it should have json1 support (https://github.com/python/cpython/commit/58d6f2ee3aeb699156d4784acccd2910d27982e7). Has anybody begun working on migrating this? I suspect that it will be an arduous task. If not, I might try taking a stab at it.
@beetbox Will we need to include backwards support for older databases or would it be better to have a forced database upgrade? I would love to use jsons to store lists and there are quite a few other features that could benefit as a result of this upgrade.
It seems that a lot of the work necessary for the conversion has already been done on #2503 if anyone else wants to try porting over the changes.
Since python 3.9.3 is now officially released, it should have json1 support (python/cpython@58d6f2e). Has anybody begun working on migrating this? I suspect that it will be an arduous task. If not, I might try taking a stab at it.
@beetbox Will we need to include backwards support for older databases or would it be better to have a forced database upgrade? I would love to use jsons to store lists and there are quite a few other features that could benefit as a result of this upgrade.
Personally, I don't see an issue with forcing a database upgrade. That's probably simpler to implement and more robust than attempting to be backwards compatible? AFAIK, up to now, beets doesn't have any serious database upgrade code (except for adding new columns to the album and item tables), but I don't see a problem with adding it. We may consider a commandline prompt where users need to acknowledge the irreversible upgrade and which we could use to encourage backing up the database first.
I'm not so sure that we want to require Python 3.9.1 (released in August 2020) just yet, that's still somewhat recent. I don't think we have any idea what Python versions are common among beets users, unfortunately. I wouldn't say that we need to retain support for any Ubuntu LTS that's still out there. But maybe at least 3.8? That's not to say that work on this feature can't be started now, but I guess a few more opinions would be nice to set a realistic timeline for actually shipping json1.
I don't want to be dismissive, but I just wanna say: What would the venn diagram of "Ubuntu users" + "beets users" look like? I bet there's not much overlap. Now if we were to say: "Ubuntu users" + "Picard users" maybe there's quite a bit of overlap.
Either way, those who are running Ubuntu (unless they use pip) are unlikely to upgrade to a new version of beets until a new release of Ubuntu appears, at which point the package gets upgraded on the Ubuntu repos.
So if you want to serve those using Ubuntu, but not pip, you'd fix up a database transition from latest Ubuntu/Ubuntu LTS to the next version of beets.
And if you want to serve those using pip on Ubuntu, perhaps it's better to just ask them to upgrade the relevant packages. They did choose to install from pip. Ubuntu users eventually learn that using Ubuntu means a commitment to running the version of the packages specifically packaged for their version of Ubuntu. If you don't, you can mess up the system stability. Truth is, it's a big commitment to run latest packages on Ubuntu.
Which brings me to my next point: Why not let us choose when to migrate the database? Perhaps you can leave a setting to use the single-value database until we can migrate. I don't know how big of a job it is to leave legacy support though. You have to figure that out yourselves.
That's my two cents as user and not developer.
edit: i came back here by chance and i noticed i have two thumbs down. at the very least you should explain which of the things i said you disagree with. kinda useless to just add an emoji and call it a day. hardly an argument, i think.
Another option could be to require different python versions per OS. MacOS should support json1 starting with python 3.7, while I think most linux OSs have been compiling their sqlite with json1 for a while (Ubuntu since at least 16.04).
So really the python requirement would be 3.7 on MacOS, 3.9 on Windows, and a somewhat recent linux distribution.
Why not let us choose when to migrate the database?
You have that option by choosing when you update beets. If you’re not ready to take the jump, just don’t update beets with the new db schema until you are. :shrug:
The massive overhaul this will be to how beets works internally and with its database would make it extremely clunky to support the two different schemas simultaneously. It’s already a task that we’ve been wanting to do/implement ASAP for 7+ years—requiring backwards compatibility with old databases would only add to the already massive task of accomplishing this.
What's the latest status update on this?
This is the one feature of Beets that's keeping me from going "all in". I have thousands of files that are tagged with multiple genres.
I'm attempting to add this for artists
, albumartists
, and genre
.
This is my first time ever dabbling in the beets codebase, so I probably won't know what I'm doing for a while, lol. But nonetheless, I'm determined to make this work.
Perhaps @jpluscplusm could share some wisdom regarding the albumtypes addition from a few weeks ago? I'm thinking the above tags could work similarly
After carefully reading through the prior comments, I am wondering why json seems to be preferred over using a simple null-valued separator in a string field? I think the performance impact would be minimal, and require the least headache in terms of dependencies, migration, etc. I would also think making non-list fields into json for the sake of 'fields being agnostic' would incur more compute cost.
At this point, the issue has been open for almost 10 years. I think most users who care about this issue will take any solution.
I will be attempting the 'null separator in a string field' solution to this problem.
@JOJ0 I think we can close this (right before the 10 year mark 😅)
Please add support for multi-valued tags. For example, this could be used for the multi-valued occurrences of PERFORMER / COMPOSER / ENSEMBLE / etc. recommended by http://age.hobba.nl/audio/mirroredpages/ogg-tagging.html .