beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.86k stars 1.82k forks source link

Adding relation information between tracks, albums and artists to beets? #440

Open pscn opened 11 years ago

pscn commented 11 years ago

I've been playing with the thought about adding relation information to beets. With relation I mean a weighted connection between 2 tracks, albums or artists. These relations could be initialized with e.g. similar track or similar artist information gathered from Last.fm or based on other data (e.g. Acoustic Attributes from the Echo Nest come to mind). They could later be refined manually by the user or automatically e.g. by tracking the users listening habits (plays and skips).

This could provide for smarter playlist creations or plugins that hook into media players to fill the queue with suggested tracks. I do something like that with my Advanced Shuffle Client for MPD.

There are a lot of open questions regarding this feature, but before getting into details, I wanted to get a vibe if it is even desired to have something like this in beets.

Any opinions about this?

sampsyo commented 11 years ago

Interesting idea. MusicBrainz also has inter-entity relationships, and people has asked for those to be reflected in their beets libraries a few times. I'm not sure how the schema would work, exactly—maybe we can pack foreign keys into flexattrs.

pedros commented 11 years ago

Just store the graph as an edge list in {item,album}_attributes, like so:

INSERT INTO item_attributes(from_identity_id, 'relation', to_identity_id);
sampsyo commented 11 years ago

Good point, @pedros. Currently, there's only one value per entity-id/key pair, but relaxing that would be useful for this and other purposes (e.g., #119?).

pedros commented 11 years ago

But I'm looking at the schema, which appears to allow an entity_id column, right?

CREATE TABLE item_attributes (
                id INTEGER PRIMARY KEY,
                entity_id INTEGER,
                key TEXT,
                value TEXT,
                UNIQUE(entity_id, key) ON CONFLICT REPLACE);
CREATE INDEX item_attributes_by_entity
                ON item_attributes (entity_id);

In Python-land, it would be something like:

item.relation = another_item.id

Or am I missing something?

sampsyo commented 11 years ago

Yes, I think we're on the same page. The only difficulty with the schema as it currently stands is that you couldn't have multiple relations of the same type for the same item. More technically, for each relationship type, the graph must be directed with maximum out-degree 1. By relaxing the UNIQUE(entity_id, key) constraint, we could get a higher-degree graph, which might be useful.

pedros commented 11 years ago

Oh, I missed the UNIQUE constraint.

pscn commented 11 years ago

Cool that you like the idea.

Why not add new tables for this feature? I'm not to thrilled on merging it into flexattrs, because it would make flexattrs handling a lot more complicated. How about something like this?

CREATE TABLE item_relations (
                id INTEGER PRIMARY KEY,
                from_entity_id INTEGER,
                to_entity_id INTEGER,
                key TEXT,
                value TEXT,
                UNIQUE(from_entity_id, key, to_entity_id) ON CONFLICT REPLACE);
CREATE INDEX item_relations_by_from_entity
                ON item_relations (from_entity_id);
CREATE INDEX item_relations_by_to_entity
                ON item_relations (to_entity_id);

EDIT: As we would usually look at this table from one entity, the index should probably only be on from_entity_id and not on both ids. EDIT²: Added index on to_entity_id to provide for reverse look-ups.

pedros commented 11 years ago

Makes sense to me.

pscn commented 11 years ago

I started a new branch beets/relation for first POCs etc.

pscn commented 11 years ago

How would we like to work with this in Python?

I'm thinking about adding a virtual relation field to the FlexModel (or LibModel?) so that we could work with an Item like this.

Item.relations.get(otherItem, 'similarity', None)
Item.relations.add(otherItem, 'play_count', 0)

or like this?

Item.relations[otherItem.id].similarity = 0.5
Item.relations[otherItem.id].skip_count++
sampsyo commented 11 years ago

Thanks for looking into this.

I'm actually a little concerned about adding a separate table for this purpose. We should certainly explore it, but here's my concern in the abstract: beets would quickly get difficult to maintain if we added new schema elements for every new feature. A new table means new code in each of the CRUD methods for each of the two models and new potential migration headaches. Part of the motivation for flexattrs was to provide a single, well-maintained abstraction that could cover a broad range of use cases. With them in place, we no longer need new tables to implement tags, ratings, and many other features people have proposed.

So while a dedicated schema for relationships might itself be cleaner (and maybe even more efficient), it would be awesome if we could find an elegant way to fold them into a single abstraction with flexattrs.

Does that make sense? I'm not totally sure how the combined feature would work, but I think it's worth pondering.

pscn commented 11 years ago

I see. And after some thinking I also understand ;)

If we wanted to stick with flexattrs we could make the to_entity_id part of the flexattrs name. For example:

'_rel|{0]:{1}'.format('similarity', other_obj.id)

The sweetness of it would be, that I could start writing plug-ins using relations right away :)

The questions remaining: Should we define a naming scheme for this and maybe add helper methods to FlexModel?

Like:

# add a new relation to other_obj.  maybe check if they are of the same class
item.add_relation(other_obj, relation_name, value)
# get the relation to the other_obj.  if no relation exists, return the default
item.get_relation(other_obj, relation_name, default=None)
# get all relations or all relations matching relation_name
# generator yielding other_obj, relation_name and value
item.get_relations(relation_name=None)

The implementation would be trivial.

As an extension to that I was thinking of adding a direction parameter. This would provide for setting a relation to go both ways or just one way. e.g. the last.fm similarity from track A to track B is different then from track B to track A. On the other hand the Echo Nest distance from track A to track B is the same both ways. So:

# add a new relation between item and other_obj.
# if direction is 'out' only from item to other_obj.
# if direction is 'in' only from other_obj to item
# if direction is 'bi' both ways
item.add_relation(other_obj, relation_name, value, direction='out')
# get the relation to the other_obj.  if no relation exists, return the default
# direction can be 'in' or 'out'
item.get_relation(other_obj, relation_name, default=None, direction='out')
# get all relations or all relations matching relation_name
# generator yielding other_obj, relation_name and value
# direction can be 'in', 'out' or 'bi'
item.get_relations(relation_name=None, direction='out')

Would that be feasible?

sampsyo commented 11 years ago

Cool! Yes, this approach seems right for use cases like similarity—here, you potentially want scores (weights) for every pair of items (edges in the graph). So encoding the second ID in the name string makes a certain amount of sense. For other cases, such as "track X is a live version of track Y", you don't need the weight and the graph isn't complete, so another approach (e.g., a list of foreign keys packed into the "live_versions" flexattr) might make more sense.

Maybe it's worth prototyping this for the use case you have in mind (similarity as calculated by acoustic parameters?) and see how it plays out. We could learn something about the right API design that way.

pscn commented 11 years ago

I certainly can try a prototype for this, keeping other use cases in mind. Maybe naming the methods like add_edge instead of add_relation or something along those lines. Or maybe trying to be smart and try to determine the kind of relation by looking at the value. e.g. an empty value indicates a weightless relation ...

However: One major drawback I see in using flexattrs is cleaning up after deleted entities. With an extra table, that would be a piece of cake. With flexattrs not so much. Every plug-in introducing relations of any kind need to add a verify or clean-up stage. Or we need to classify flexattrs or ...

pscn commented 11 years ago

Putting this on ice for now, as I'm not 100% sure how I want it ...

sampsyo commented 11 years ago

Alright, sounds good—let's leave this open, though, since I think it will be interesting to revisit.