beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.73k stars 1.82k forks source link

Auto-tagger picking weaker match #3564

Closed nirvdrum closed 4 years ago

nirvdrum commented 4 years ago

Problem

I'm running into a problem where the auto-tagger is ignoring my country preference and consequently picking a weaker match. The vast majority of my CD collection consists of US releases, but I do have some imports. As a result, my match.preferred.countries configuration is ['US'].

Running this command in verbose (-vv) mode:

beet -vv import -t /media/cd_rips/Pretty\ Hate\ Machine/

Led to this problem:

overlaying configuration: /root/beets/config.yaml
no user configuration found at /root/.config/beets/config.yaml
data directory: /root/.config/beets
plugin paths: 
Sending event: pluginload
inline: adding item field albumtype_display
inline: adding item field compilation_display
inline: adding item field country_display
inline: adding item field disambiguation_display
lyrics: Disabling google source: no API key configured.
library database: /media/music/.beets.db
library directory: /media/music
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
Sending event: import_task_start
Looking up: /media/cd_rips/Pretty Hate Machine
Tagging Nine Inch Nails - Pretty Hate Machine
No album ID found.
Search terms: Nine Inch Nails - Pretty Hate Machine
Album might be VA: False
Searching for MusicBrainz releases with: {'release': 'pretty hate machine', 'artist': 'nine inch nails', 'tracks': '10'}
Requesting MusicBrainz release 95e2508f-e830-4a79-a85c-ad5db1ae64cd
primary MB release type: album
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (95e2508f-e830-4a79-a85c-ad5db1ae64cd)
Computing track assignment...
...done.
Success. Distance: 0.05
Requesting MusicBrainz release ddf2fb0f-d0a7-4fd1-86ae-0843f57b95fd
primary MB release type: album
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (ddf2fb0f-d0a7-4fd1-86ae-0843f57b95fd)
Computing track assignment...
...done.
Success. Distance: 0.06
Requesting MusicBrainz release 5b5d00e8-3d90-3da7-9c17-c9e0e0376076
primary MB release type: album
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (5b5d00e8-3d90-3da7-9c17-c9e0e0376076)
Computing track assignment...
...done.
Success. Distance: 0.02
Requesting MusicBrainz release c90cf375-3208-378a-8535-8c9d68b5446b
primary MB release type: album
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (c90cf375-3208-378a-8535-8c9d68b5446b)
Computing track assignment...
...done.
Success. Distance: 0.05
Requesting MusicBrainz release d0e34aea-d5d4-3ec1-9147-24e1de5f001d
primary MB release type: album
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (d0e34aea-d5d4-3ec1-9147-24e1de5f001d)
Computing track assignment...
...done.
Success. Distance: 0.03
discogs: Searching for master release 3406
discogs: hit rate limit, waiting for 0.6294822692871094 seconds
discogs: Searching for master release 3406
discogs: hit rate limit, waiting for 0.6566057205200195 seconds
discogs: Searching for master release 3406
discogs: hit rate limit, waiting for 0.6517527103424072 seconds
discogs: Searching for master release 3406
discogs: hit rate limit, waiting for 0.6592931747436523 seconds
discogs: Searching for master release 3406
discogs: hit rate limit, waiting for 0.6544384956359863 seconds
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (3004371)
Computing track assignment...
...done.
Success. Distance: 0.09
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (75544)
Computing track assignment...
...done.
Success. Distance: 0.06
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (9154868)
Computing track assignment...
...done.
Success. Distance: 0.09
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (2580092)
Computing track assignment...
...done.
Success. Distance: 0.14
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (145796)
Computing track assignment...
...done.
Success. Distance: 0.03
Evaluating 10 candidates.

/media/cd_rips/Pretty Hate Machine (10 items)
Sending event: before_choose_candidate
Tagging:
    Nine Inch Nails - Pretty Hate Machine
URL:
    https://musicbrainz.org/release/5b5d00e8-3d90-3da7-9c17-c9e0e0376076
(Similarity: 98.1%) (country, year) (CD, 1991, AU, Interscope Records, 7567-91834-2)

As you can see, I'm only presented with a single choice. An Australian release with a 98.1% match. However, if I search on Musicbrainz, there's a US release with a 100% match:

[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort, eDit, edit Candidates? i
Enter release ID: 60a04a88-3956-49f5-9d0f-b2603be9f612
Tagging Nine Inch Nails - Pretty Hate Machine
Searching for album ID: 60a04a88-3956-49f5-9d0f-b2603be9f612
Requesting MusicBrainz release 60a04a88-3956-49f5-9d0f-b2603be9f612
primary MB release type: album
Sending event: albuminfo_received
Candidate: Nine Inch Nails - Pretty Hate Machine (60a04a88-3956-49f5-9d0f-b2603be9f612)
Computing track assignment...
...done.
Success. Distance: 0.00
discogs: Searching for release 60a04a88-3956-49f5-9d0f-b2603be9f612
Evaluating 1 candidates.
Sending event: before_choose_candidate
Tagging:
    Nine Inch Nails - Pretty Hate Machine
URL:
    https://musicbrainz.org/release/60a04a88-3956-49f5-9d0f-b2603be9f612
(Similarity: 100.0%) (CD, 1989, US, TVT Records, TVT 2610-2)

I'm able to select "More candidates", but that seems to have an upper limit of 10 items. Some releases I'm looking at have more than 10 and the 100% match is buried deeper in the list.

Setup

My configuration (output of beet config) is:

lyrics:
    bing_lang_from: []
    auto: yes
    force: yes
    bing_client_secret: REDACTED
    bing_lang_to:
    google_API_key: REDACTED
    google_engine_ID: REDACTED
    genius_api_key: REDACTED
    fallback:
    local: no
    sources:
    - google
    - lyricwiki
    - musixmatch
    - genius
directory: /media/music
library: /media/music/.beets.db
per_disc_numbering: yes

plugins:
- discogs
- duplicates
- edit
- embedart
- extrafiles
- fetchart
- info
- inline
- lastgenre
- lyrics
- missing
- plexupdate
- scrub
- the
- web
discogs:
    user_token: REDACTED
    apikey: REDACTED
    apisecret: REDACTED
    tokenfile: discogs_token.json
    source_weight: 0.5
embedart:
    auto: no
    maxwidth: 0
    compare_threshold: 0
    ifempty: no
    remove_art_file: no
extrafiles:
    patterns:
        artwork: ['*.jpg']
        eac_logs: ['*.log']

    paths: {}
fetchart:
    midwidth: 400
    sources:
    - filesystem
    - itunes
    - amazon
    - albumart
    - coverart
    auto: yes
    minwidth: 0
    maxwidth: 0
    enforce_ratio: no
    cautious: no
    cover_names:
    - cover
    - front
    - art
    - album
    - folder
    google_key: REDACTED
    google_engine: 001442825323518660753:hrh5ch1gjzm
    fanarttv_key: REDACTED
    store_source: no
format_album: $albumartist - $album ($year) $id
item_fields:
    albumtype_display: ''''' if albumtype.upper() == ''ALBUM'' or albumtype.upper() == ''COMPILATION'' else '' [%s]'' % (albumtype.upper())'
    compilation_display: '''Soundtracks'' if albumtype.upper() == ''SOUNDTRACK'' else ''Compilations'''
    country_display: country_display = ' [%s]' % (country.upper()); return '' if country.strip() == '' or country.upper() == 'US' or country_display in album else country_display
    disambiguation_display: ''''' if albumdisambig == '''' else '' (%s)'' % ('' ''.join([part.capitalize() for part in albumdisambig.split('' '')]))'

match:
    preferred:
        countries:
        - US
        media: [CD, Digital Media|File]
        original_year: yes

paths:
    comp: $compilation_display/$year - $album$country_display$disambiguation_display (%upper{$format})/$disctitle/$track $title
    default: '%the{$albumartist}/$year - $album$country_display$albumtype_display$disambiguation_display (%upper{$format})/$disctitle/$track $title'
plex:
    host: 192.168.13.202
    port: 32400
    token: REDACTED
    library_name: Music
web:
    host: 0.0.0.0
    port: 8337
    cors: ''
    cors_supports_credentials: no
    reverse_proxy: no
    include_paths: no
edit:
    albumfields: album albumartist
    itemfields: track title artist album
    ignore_fields: id path
duplicates:
    album: no
    checksum: ''
    copy: ''
    count: no
    delete: no
    format: ''
    full: no
    keys: []
    merge: no
    move: ''
    path: no
    tiebreak: {}
    strict: no
    tag: ''
pathfields: {}
album_fields: {}
lastgenre:
    whitelist: yes
    min_weight: 10
    count: 1
    fallback:
    canonical: no
    source: album
    force: yes
    auto: yes
    separator: ', '
    prefer_specific: no
missing:
    count: no
    total: no
    album: no
scrub:
    auto: yes
the:
    the: yes
    a: yes
    format: '{0}, {1}'
    strip: no
    patterns: []
sampsyo commented 4 years ago

Hi! See also https://github.com/beetbox/beets/issues/3171#issuecomment-468513368 for a similar discussion.

While this is indeed an issue, it's not clear what we can do about it. For example, we could decide to increase the number of results fetched by default—but how many would be enough? Each result we ask for has a performance penalty (we have to round-trip to the MB server for each one). There's no way to force MusicBrainz to bring up the most relevant matches first. So I'm unfortunately not sure there's a clear way to resolve this.

nirvdrum commented 4 years ago

I guess one way to start would be able to load more than 10 candidates if performing the "More candidates" action. As far as I can tell, the user can only perform that action once. I'd be able to find the match better if I could paginate through more results. Currently, I have to open the URL for the wrong match, then go to the releases page, then find the likely candidate, then get the ID and go back to beets to enter it.

While I appreciate not wanting to impact performance in common cases, my manual solution is considerably slower than I anticipate fetching more results programmatically would be.

sampsyo commented 4 years ago

Hmm; if it's going to require explicit user interaction, what do you think abut the proposed solution in #3171? It would fetch all the releases for the same release group.

nirvdrum commented 4 years ago

A new option that would fetch all releases would be fine by me. Alternatively, I'd be happy adjusting the default search limit. I haven't gone digging through the code, but I couldn't find a parameter that seemed to match. If such an option exists, setting it to 20 or 25 would cover a large number of the cases I've run into. I don't know what the overall distribution of release count is, however.

stale[bot] commented 4 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.