beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.83k stars 1.82k forks source link

lastimport too strict? #3110

Closed ghost closed 4 years ago

ghost commented 5 years ago

Setup

My configuration (output of beet config) is:

directory: ~/Music/BeetsLibrary
library: ~/Music/beetslibrary.db
import:
  write: yes
  move: yes
  resume: no
match:
  distance_weights:
    missing_tracks: 0
    artist: 10
    title: 10
    length: 10
paths:
    default: $albumartist/$year $album%aunique{}/$track $title
    singleton: $artist/Non-Album/$year $title
    comp: Compilations/$album%aunique{}/$track $title
plugins:
  convert
  ftintitle
  lastimport
  info
  importadded
  fuzzy
  edit
lastfm:
  user: myusername

This plugin is great, however, I noticed that some of my library tracks did not get matched with last.fm data due to minor differences.

For example, in Random Access Memories by Daft Punk: "Touch (feat. Paul Williams)" - Last.fm "Touch" - My library

Or even worse, from M.A.A.D City: "Compton (feat. Dr. Dre)" - Last.fm "Compton feat. Dr. Dre" - My library (following musicbrainz style guidelines)

I would try to fix it myself but I want to make sure I'm not missing something first.

sampsyo commented 5 years ago

True! It uses exact matches, with a few heuristic hacks. You can see an existing hack there that tries to normalize quote characters, for example: https://github.com/beetbox/beets/blob/8cfbc8274ee3b6ea42433a50becc19f1324f5cef/beetsplug/lastimport.py#L221-L229

You can be infinitely creative about how to cope with parentheses.

ghost commented 5 years ago

I'm thinking of contributing to the plugin to add this feature. So far I'm trying to have a function (either written by me or from another open source project that already dealt with this) detect featured artists only from the title string, create variations of the title (e.g. ['Touch', 'Touch feat. Paul Williams', 'Touch (ft. Paul Williams)']), then try querying the library by looping over those variations to cover all bases.

Am I missing something about the last.fm API that could be used to avoid this route? I assumed there isn't otherwise it would have been implemented since the beginning.

sampsyo commented 5 years ago

Yep, I think you’re on the right track doing it this way! Thanks for looking into it.

ghost commented 5 years ago

I was considering using feat_tokens from plugins.py, but quickly realized that it is insufficient for this task:

from beets import plugins

feat_tokens = plugins.feat_tokens()

titles = [
    'Sucker For Pain (with Wiz Khalifa, Imagine Dragons, Logic & Ty Dolla $ign feat. X Ambassadors)',
    'Ticker Tape (feat. Carly Simon & Kali Uchis)',
    'Lose Yourself to Dance (feat. Pharrell Williams)',
    'She’s My Collar (Feat. Kali Uchis)', 'SIRENS | Z1RENZ [FEAT. J.I.D | J.1.D]',
    'She Wolf (Falling to Pieces) [feat. Sia]',
    'Love’s Vagrant (Ringabell) [ft. Ralfington]'
          ]

for title in titles:
    print(title)
    print(re.findall(feat_tokens, title))

returns

Sucker For Pain (with Wiz Khalifa, Imagine Dragons, Logic & Ty Dolla $ign feat. X Ambassadors)
['&', 'feat.']
Ticker Tape (feat. Carly Simon & Kali Uchis)
['&']
Lose Yourself to Dance (feat. Pharrell Williams)
[]
She’s My Collar (Feat. Kali Uchis)
[]
SIRENS | Z1RENZ [FEAT. J.I.D | J.1.D]
[]
She Wolf (Falling to Pieces) [feat. Sia]
[]
Love’s Vagrant (Ringabell) [ft. Ralfington]
[]

It does not check for parentheses, brackets, capitalization, etc.

But I'm not sure if I should work on the function to expand its matching ability. Such a regex string would be monstrous one that can find for example: 'ft.', 'Ft.', 'FT.', 'feat.', 'Feat.', 'FEAT.', 'f/', 'F/', 'f.', 'F.', 'featuring', 'Featuring', 'FEATURING', 'with', 'With', 'WITH', 'vs', 'vs.', 'VS', 'VS.', 'Vs.', 'Vs', 'and', 'And', 'AND', 'con', 'Con', 'CON', '&', etc.

I'm worried that such a change would cause other plugins problems and matching too much. On the other hand, I'm thinking that perhaps it might benefit other plugins using the function to increase their accuracy.

I'm thinking I have 3 main options:

  1. expand feat_tokens
  2. make a new function that expands on feat_tokens (perhaps expanded_feat_tokens or something) in plugins.py
  3. just keep the matching logic within the function I'm making in lastimport.py

What do you think?

sampsyo commented 5 years ago

Hmm… it seems like the patterns you're looking for are pretty idiosyncratic to Last.fm's chaotic naming conventions. So maybe special-purpose logic just for the plugin is in order?

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.