devsnd / cherrymusic

Stream your own music collection to all your devices! The easy to use free and open-source music streaming server.
http://www.fomori.org/cherrymusic
GNU General Public License v3.0
1.03k stars 187 forks source link

searchable ID3 tags #5

Open tilboerner opened 12 years ago

devsnd commented 12 years ago

possible solution, since eyeD3 is only available for python2:

http://code.google.com/p/stagger/

devsnd commented 12 years ago

It should be possible to use either the files or the id3tags or both to index the music.

a good sorted collection can just be added as file
a mutilated windows media player or itunes collection can be added with id3tags only
both can be used if it's a mixed collection
devsnd commented 11 years ago

yet another good id3 tag lib would be http://pypi.python.org/pypi/tagpy

devsnd commented 11 years ago

proposal: file meta table:

types = {1:'artist', 2:'album', 3:'title', 4:'year', 5:'genre'}

int type int did
1 100
2 101
3 102

did relates to a dictionary entry. Should they be mixed in with the file dictionary? If so, it would be much lighter on the hard drive, but this also means that we couldn't search for an artist name directly; The search results would be mixed, because a dictionary entry makes no difference between file name and meta info and would only later be resolved by another table similar to the search table.

tilboerner commented 11 years ago

Yeah, I'd say we use one dictionary to keep the terms for every search. What would be the benefit of duplicating wordlists?

But I don't get what the tagtype to did table is good for. Is there really no column for file id? Because at some point, we need to have rows somewhere that associate words with tagtypes with files. Having an extra table without files would duplicate data.

If performance allows it, I'd keep it all together in one search table:

tid did fid
: : :

Because, you see, tid = 0 is the filename tag. (There should be a table for tagtypes, too. Names of types are unique.)

devsnd commented 11 years ago

I thought a little more about it:

The dictionary can stay, I'm okay with that, but if we really want to have some advantage by using ID-tags, we should make it a tree. I'll paint a picture. id3-tree

since in OR-mapping all those relationships are bi-directional, we can now determine the album of a track, or search for a title, or list all albums of an artist and so on.

tilboerner commented 11 years ago

Not sure if I'm getting the picture. (It's late, there's wine involved.) But I know that there are albums that feature multiple artists and tracks that appear on several albums, as there are tracks that are on no album. Genre tags are weird enough to me to be considered random strings. Album and year, ok, yeah.

More importantly, all the examples you mention can be found in a flat structure as well, by the right query; so it's performance you're after?

There are so many ways to sort and categorize music. Artist -> album -> track is merely one of them. If there weren't more, we wouldn't need playlists. That's why I think there should be a very open and flexible tagging system, as a means to express and discover relationships, that goes beyond this traditional structure (as it might be encoded in the file system) or playlists that were compiled manually. I'd refrain from encoding such relationships in the database for that reason, so there's no bias to limit what can be reasonably done. I think I need to work out this argument a little better...

devsnd commented 11 years ago

Ah, I get your point. So you want the ID3 tags just to be a special kind of tagging (reseved tag type indexes)?

If I get you right, then I like that Idea very much. We can also introduce another tag type called star, which would then be the upvote. Then we could determine who tagged what by introducing another field for userid. userid 0 or -1 would then correspond to the imaginary user "system".

tagtype id dictionary id file id user id
: : :

Then it would be easy to issue searches like: show all 'star'ed albums by the artist xy. Or show the favorites of user abc.

Ah, depending on sqlite not using any space for null values, system could have null as identifier, that'll remove almost half of the needed storage space. I'll check that out.

UPDATE: http://stackoverflow.com/questions/7051923/space-consumption-of-null-columns-in-sqlite-db null uses one byte, apparently, so no gain, since: http://www.sqlite.org/datatype3.html

INTEGER. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

tilboerner commented 11 years ago

Yeah, I've been wondering if tags should be per-user and how to handle generally valid tags. I had the system user in mind, too. That's probably a good way to go about it. ( One of 0 or -1 is already taken by nobody, so system should take the other.)

Nice thinking about making star a tag. That's the kind of thing I had in mind!

So you want the ID3 tags just to be a special kind of tagging (reseved tag type indexes)?

Actually, not special at all:

We should allow multi-word tags, so we need to preserve the original string. Have a global tag table, without types. (Type is only relevant the moment the tag is attached to something.)

tag id tag content

This gets indexed by the dictionary. Btw, the dictionary is really just a stand-in, because we can't rely on decent fulltext search which we could use directly on the real table. Right?

How silly is it to move file names into the tag table? I'd like to do that. Even weirder: file ids should be UUIDs unless we really can't afford it, but that's part of another theory I have.

tilboerner commented 11 years ago

I finally got around to write up some ideas about how tags (= all names and other data related to music) could work. I put it in the wiki, because I couldn't fit it in just two paragraphs. Let's see if we can get a discussion going over there?