Open pscn opened 11 years ago
Interesting idea. MusicBrainz also has inter-entity relationships, and people has asked for those to be reflected in their beets libraries a few times. I'm not sure how the schema would work, exactly—maybe we can pack foreign keys into flexattrs.
Just store the graph as an edge list in {item,album}_attributes, like so:
INSERT INTO item_attributes(from_identity_id, 'relation', to_identity_id);
Good point, @pedros. Currently, there's only one value per entity-id/key pair, but relaxing that would be useful for this and other purposes (e.g., #119?).
But I'm looking at the schema, which appears to allow an entity_id
column, right?
CREATE TABLE item_attributes (
id INTEGER PRIMARY KEY,
entity_id INTEGER,
key TEXT,
value TEXT,
UNIQUE(entity_id, key) ON CONFLICT REPLACE);
CREATE INDEX item_attributes_by_entity
ON item_attributes (entity_id);
In Python-land, it would be something like:
item.relation = another_item.id
Or am I missing something?
Yes, I think we're on the same page. The only difficulty with the schema as it currently stands is that you couldn't have multiple relations of the same type for the same item. More technically, for each relationship type, the graph must be directed with maximum out-degree 1. By relaxing the UNIQUE(entity_id, key)
constraint, we could get a higher-degree graph, which might be useful.
Oh, I missed the UNIQUE
constraint.
Cool that you like the idea.
Why not add new tables for this feature? I'm not to thrilled on merging it into flexattrs
, because it would make flexattrs
handling a lot more complicated. How about something like this?
CREATE TABLE item_relations (
id INTEGER PRIMARY KEY,
from_entity_id INTEGER,
to_entity_id INTEGER,
key TEXT,
value TEXT,
UNIQUE(from_entity_id, key, to_entity_id) ON CONFLICT REPLACE);
CREATE INDEX item_relations_by_from_entity
ON item_relations (from_entity_id);
CREATE INDEX item_relations_by_to_entity
ON item_relations (to_entity_id);
EDIT: As we would usually look at this table from one entity, the index should probably only be on from_entity_id
and not on both ids.
EDIT²: Added index on to_entity_id
to provide for reverse look-ups.
Makes sense to me.
I started a new branch beets/relation for first POCs etc.
How would we like to work with this in Python?
I'm thinking about adding a virtual relation
field to the FlexModel
(or LibModel
?) so that we could work with an Item
like this.
Item.relations.get(otherItem, 'similarity', None)
Item.relations.add(otherItem, 'play_count', 0)
or like this?
Item.relations[otherItem.id].similarity = 0.5
Item.relations[otherItem.id].skip_count++
Thanks for looking into this.
I'm actually a little concerned about adding a separate table for this purpose. We should certainly explore it, but here's my concern in the abstract: beets would quickly get difficult to maintain if we added new schema elements for every new feature. A new table means new code in each of the CRUD methods for each of the two models and new potential migration headaches. Part of the motivation for flexattrs was to provide a single, well-maintained abstraction that could cover a broad range of use cases. With them in place, we no longer need new tables to implement tags, ratings, and many other features people have proposed.
So while a dedicated schema for relationships might itself be cleaner (and maybe even more efficient), it would be awesome if we could find an elegant way to fold them into a single abstraction with flexattrs.
Does that make sense? I'm not totally sure how the combined feature would work, but I think it's worth pondering.
I see. And after some thinking I also understand ;)
If we wanted to stick with flexattrs
we could make the to_entity_id
part of the flexattrs
name. For example:
'_rel|{0]:{1}'.format('similarity', other_obj.id)
The sweetness of it would be, that I could start writing plug-ins using relations right away :)
The questions remaining: Should we define a naming scheme for this and maybe add helper methods to FlexModel
?
Like:
# add a new relation to other_obj. maybe check if they are of the same class
item.add_relation(other_obj, relation_name, value)
# get the relation to the other_obj. if no relation exists, return the default
item.get_relation(other_obj, relation_name, default=None)
# get all relations or all relations matching relation_name
# generator yielding other_obj, relation_name and value
item.get_relations(relation_name=None)
The implementation would be trivial.
As an extension to that I was thinking of adding a direction
parameter. This would provide for setting a relation to go both ways or just one way. e.g. the last.fm similarity from track A to track B is different then from track B to track A. On the other hand the Echo Nest distance from track A to track B is the same both ways. So:
# add a new relation between item and other_obj.
# if direction is 'out' only from item to other_obj.
# if direction is 'in' only from other_obj to item
# if direction is 'bi' both ways
item.add_relation(other_obj, relation_name, value, direction='out')
# get the relation to the other_obj. if no relation exists, return the default
# direction can be 'in' or 'out'
item.get_relation(other_obj, relation_name, default=None, direction='out')
# get all relations or all relations matching relation_name
# generator yielding other_obj, relation_name and value
# direction can be 'in', 'out' or 'bi'
item.get_relations(relation_name=None, direction='out')
Would that be feasible?
Cool! Yes, this approach seems right for use cases like similarity—here, you potentially want scores (weights) for every pair of items (edges in the graph). So encoding the second ID in the name string makes a certain amount of sense. For other cases, such as "track X is a live version of track Y", you don't need the weight and the graph isn't complete, so another approach (e.g., a list of foreign keys packed into the "live_versions" flexattr) might make more sense.
Maybe it's worth prototyping this for the use case you have in mind (similarity as calculated by acoustic parameters?) and see how it plays out. We could learn something about the right API design that way.
I certainly can try a prototype for this, keeping other use cases in mind. Maybe naming the methods like add_edge
instead of add_relation
or something along those lines. Or maybe trying to be smart and try to determine the kind of relation by looking at the value. e.g. an empty value indicates a weightless
relation ...
However: One major drawback I see in using flexattrs
is cleaning up after deleted entities. With an extra table, that would be a piece of cake. With flexattrs
not so much. Every plug-in introducing relations
of any kind need to add a verify
or clean-up
stage. Or we need to classify flexattrs
or ...
Putting this on ice for now, as I'm not 100% sure how I want it ...
Alright, sounds good—let's leave this open, though, since I think it will be interesting to revisit.
I've been playing with the thought about adding
relation
information to beets. Withrelation
I mean aweighted
connection between 2 tracks, albums or artists. These relations could be initialized with e.g. similar track or similar artist information gathered from Last.fm or based on other data (e.g. Acoustic Attributes from the Echo Nest come to mind). They could later be refined manually by the user or automatically e.g. by tracking the users listening habits (plays and skips).This could provide for smarter playlist creations or plugins that hook into media players to fill the queue with suggested tracks. I do something like that with my Advanced Shuffle Client for MPD.
There are a lot of open questions regarding this feature, but before getting into details, I wanted to get a vibe if it is even desired to have something like this in beets.
Any opinions about this?