forte-music / core

The core backend server for forte.
Other
6 stars 1 forks source link

External Data Sources #36

Open 0xcaff opened 6 years ago

0xcaff commented 6 years ago

Currently, the only data used to index files, is data from the songs tags and files around the song. This doesn't seem to be enough (#18, 8e9ae36e447350802c68ef74a5d952498b4ed352, #31).

Our plan so far has been to assume the music is well tagged. Even my collection doesn't seem to be well tagged. For forte to work, we need to have good metadata about audio and rich album artwork. For these reasons, we should consider pulling data from external sources (like musicbrainz and AcoustID).

The downside is that importing would be slower. Currently it takes about 15m to import 3k items stored on a NAS. Using AcoustID would mean that importing this many items would take 16m just to request the data from the AcoustID server (probably the main bottleneck). The old way could be hidden behind a flag.

The quality and reliability musicbrainz, acousticid and coverartarchive is really good compared to what it was before. I think it is worth the extra import time to have good data. Also, it goes with the requirement that input files will not be mutated, will still providing a good experience.

0xcaff commented 6 years ago

On the other hand, tagging music is hard and beets does it really well. Maybe, we could work on interoperability with beets. Some way to import your hand tagged beets collection into forte.

0xcaff commented 6 years ago

Here's an example of the data exposed by the beets export plugin:

{
    "acoustid_fingerprint": "",
    "acoustid_id": "",
    "added": "2018-05-29 19:04:54",
    "album": "G Spot.",
    "album_id": "3",
    "albumartist": "Speedy J",
    "albumartist_credit": "Speedy J",
    "albumartist_sort": "Speedy J",
    "albumdisambig": "",
    "albumstatus": "Official",
    "albumtotal": "10",
    "albumtype": "album",
    "arranger": "",
    "artist": "Speedy J",
    "artist_credit": "Speedy J",
    "artist_sort": "Speedy J",
    "artpath": "None",
    "asin": "B000007UGD",
    "bitdepth": "0",
    "bitrate": "251kbps",
    "bpm": "0",
    "catalognum": "WARPCD27",
    "channels": "2",
    "comments": "",
    "comp": "False",
    "composer": "Jochem Paap",
    "composer_sort": "Paap, Jochem",
    "country": "GB",
    "data_source": "MusicBrainz",
    "day": "27",
    "disc": "01",
    "disctitle": "",
    "disctotal": "01",
    "encoder": "",
    "filesize": "0",
    "format": "MP3",
    "genre": "Electronic",
    "grouping": "",
    "id": "66",
    "initial_key": "",
    "label": "Warp",
    "language": "eng",
    "length": "5:06",
    "lyricist": "",
    "lyrics": "",
    "mb_albumartistid": "734fa82c-864e-468b-bee4-944cb4b1952b",
    "mb_albumid": "015aa9b3-0e76-4121-865a-1b599bc20f8c",
    "mb_artistid": "734fa82c-864e-468b-bee4-944cb4b1952b",
    "mb_releasegroupid": "88d68733-20c1-3518-9d0d-dfab72a8498a",
    "mb_releasetrackid": "a892b027-666b-345a-9f23-60ac91468c86",
    "mb_trackid": "5a28222a-c75b-4572-a4cb-6f73b776ee65",
    "media": "CD",
    "month": "03",
    "mtime": "1969-12-31 19:00:00",
    "original_day": "27",
    "original_month": "03",
    "original_year": "1995",
    "r128_album_gain": "000000",
    "r128_track_gain": "000000",
    "rg_album_gain": "0.0",
    "rg_album_peak": "0.0",
    "rg_track_gain": "0.0",
    "rg_track_peak": "0.0",
    "samplerate": "44kHz",
    "script": "Latn",
    "singleton": "False",
    "title": "Grogono",
    "track": "10",
    "track_alt": "10",
    "tracktotal": "10",
    "year": "1995"
}

It seems to be missing the path. This was generated by running:

beet export --include-keys='*' --library

The schema of beets can be found https://github.com/beetbox/beets/blob/3373b090bdae9bbc9ffb3653beb8553498e3c845/beets/library.py#L421-L493

https://github.com/beetbox/beets/blob/3373b090bdae9bbc9ffb3653beb8553498e3c845/beets/library.py#L893-L934

Both the item and album seem to be available, but only the items are exposed. The database is stored in the beets config folder in the library.db file. Here is its schema.

CREATE TABLE item_attributes (
  id INTEGER PRIMARY KEY, 
  entity_id INTEGER, 
  key TEXT, 
  value TEXT, 
  UNIQUE(entity_id, key) ON CONFLICT REPLACE
);
CREATE INDEX item_attributes_by_entity ON item_attributes (entity_id);
CREATE TABLE albums (
  id INTEGER PRIMARY KEY, artpath BLOB, 
  added REAL, albumartist TEXT, albumartist_sort TEXT, 
  albumartist_credit TEXT, album TEXT, 
  genre TEXT, year INTEGER, month INTEGER, 
  day INTEGER, disctotal INTEGER, comp INTEGER, 
  mb_albumid TEXT, mb_albumartistid TEXT, 
  albumtype TEXT, label TEXT, mb_releasegroupid TEXT, 
  asin TEXT, catalognum TEXT, script TEXT, 
  language TEXT, country TEXT, albumstatus TEXT, 
  albumdisambig TEXT, rg_album_gain REAL, 
  rg_album_peak REAL, r128_album_gain INTEGER, 
  original_year INTEGER, original_month INTEGER, 
  original_day INTEGER
);
CREATE TABLE album_attributes (
  id INTEGER PRIMARY KEY, 
  entity_id INTEGER, 
  key TEXT, 
  value TEXT, 
  UNIQUE(entity_id, key) ON CONFLICT REPLACE
);
CREATE INDEX album_attributes_by_entity ON album_attributes (entity_id);

This information was gathered with beets 1.4.7.

There are a couple of things we could do to integrate with beets.

  1. Open the database and start importing. This would work, but relies directly on implementation details. But, beets isn't really changing. That said, the plugin also exposes implementation details by just enumerating all columns.

  2. Write a plugin to expose album and item data. Item data is exposed by the export plugin, but it is missing the path. Album data isn't exposed. This is leaky, but less leaky than just hooking into the database directly.

0xcaff commented 6 years ago

It looks like the path is removed from one of the representations. https://github.com/beetbox/beets/blob/db782a2404fa8a6827c10a6536b4a960d19af135/beetsplug/info.py#L69

It still returns the internal item where the data is generated. I think we should opt for 2 because it is a less leaky abstraction. This would require creating a new plugin and injecting at runtime into a beets instance.

0xcaff commented 6 years ago

I've reached out to the beets community to see how they would like to go about this. https://discourse.beets.io/t/better-interoperability/460

0xcaff commented 6 years ago

The beets community hasn't been very responsive to our idea. If I were to guess why the path isn't returned, it is probably because paths are byte arrays and hard to represent in json. If we wanted to take advantage of beets, we should probably just make our own beets plugin.

0xcaff commented 5 years ago

Integrating with beets will take about as much code as just ripping the important ideas from beets and putting it into our thing. It seems like beets does the following:

I don't think we need this now. Maybe later.