Importer fails to find perfect match until given MusicBrainz release ID

Greetings,

I recently discovered beets and have been trying to import my CD collection. I'm having trouble getting the importer to present the correct album version as a candidate for selection. First I rip the CD with 'fre:ac' and I find it adds correct tags to my FLAC file for: ARTIST, TITLE, ALBUM, DATE, GENRE, MEDIA. When I start the import, beets is able to correctly deduce the Album and Artist and will present what it thinks is a match, but this is often the wrong country or wrong year for the album release. So in my config I added a match preference to try and force only candidates associated with my country, media type, and album year.

This doesn't help much. I continue to get candidates for other media types and countries. Beets knows these aren't great matches and assigns them less than a 100% score. Here is a screenshot of trying to import the Album:

The candidate presented isn't a perfect match since the year doesn't match. So I ask for more candidates and get five that aren't matches:

What is frustrating is that if I bring up the MusicBrainz release versions page, I can see the correct version I want is the first one on the page after throwing out entries not matching my country or media type:

To get around this problem, I find the correct release on MusicBrainz and paste it into the beets importer prompt. When I do that, now beets shows my manual entry as a 100% match!

What is going on here? Is beets not getting all the possible candidates back from MusicBrainz in order to try and find a better match? Is there a way to ask for more candidates other than the five I see during the import session? Having to manually search the website for the correct match and pasting it into beets becomes tedious fast.

Debug

Running 'beet import' in verbose (-vv) mode:

user configuration: /Users/moore/.config/beets/config.yaml
data directory: /Users/moore/.config/beets
plugin paths: 
Sending event: pluginload
library database: /Users/moore/.config/beets/library.db
library directory: /Users/moore/Music/Beets-Test
Sending event: library_opened
Sending event: import_begin
state file could not be read: [Errno 2] No such file or directory: '/Users/moore/.config/beets/state.pickle'
state file could not be read: [Errno 2] No such file or directory: '/Users/moore/.config/beets/state.pickle'
Sending event: import_task_created
Sending event: import_task_start
Looking up: /Users/moore/Music/freac-rips/U2/War
Tagging U2 - War
No album ID found.
Search terms: U2 - War
Album might be VA: False
Searching for MusicBrainz releases with: {'release': 'war', 'artist': 'u2', 'tracks': '10'}
Requesting MusicBrainz release 1318f39c-362a-4266-b9de-c653525d52c6
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: U2 - War (1318f39c-362a-4266-b9de-c653525d52c6)
Computing track assignment...
...done.
Success. Distance: 0.04
Requesting MusicBrainz release 4ec1a9bf-fd01-4184-9def-7940874547cf
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: U2 - War (4ec1a9bf-fd01-4184-9def-7940874547cf)
Computing track assignment...
...done.
Success. Distance: 0.02
Requesting MusicBrainz release 8a278f76-f452-42a4-aede-fdd1990c09ca
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: U2 - War (8a278f76-f452-42a4-aede-fdd1990c09ca)
Computing track assignment...
...done.
Success. Distance: 0.04
Requesting MusicBrainz release 074ab7a5-eac2-4e87-8748-6c3034bdeeb9
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: U2 - War (074ab7a5-eac2-4e87-8748-6c3034bdeeb9)
Computing track assignment...
...done.
Success. Distance: 0.04
Requesting MusicBrainz release 1219c9fd-7278-44ce-a480-d4f5737c640d
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: U2 - War (1219c9fd-7278-44ce-a480-d4f5737c640d)
Computing track assignment...
...done.
Success. Distance: 0.07
Evaluating 5 candidates.

/Users/moore/Music/freac-rips/U2/War (10 items)
Sending event: import_task_before_choice
Sending event: before_choose_candidate
Tagging:
    U2 - War
URL:
    https://musicbrainz.org/release/4ec1a9bf-fd01-4184-9def-7940874547cf
(Similarity: 97.8%) (year) (CD, 2008, US, Universal Island Records, B0010832-02)
[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort? Finding tags for album "U2 - War".
Candidates:
1. U2 - War (97.8%) (year) (CD, 2008, US, Universal Island Records, B0010832-02)
2. U2 - War (96.5%) (media) (Digital Media, 1983, US, un-remastered iTunes)
3. U2 - War (96.1%) (year, country) (CD, 2008, XE, Universal Island Records, 1764647, super jewel box)
4. U2 - War (96.1%) (year, country) (CD, 2008, XE, Universal Island Records, 1764647, jewel case)
5. U2 - War (92.5%) (media, year, country) (12" Vinyl, 2008, XE, Mercury Records, 1761674)
# selection (default 1), Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort? Enter release ID: Tagging U2 - War
Searching for album ID: https://musicbrainz.org/release/cd62302d-c3aa-439b-b4f3-1a891625581e
Requesting MusicBrainz release https://musicbrainz.org/release/cd62302d-c3aa-439b-b4f3-1a891625581e
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_track_extract
Sending event: mb_album_extract
Sending event: albuminfo_received
Candidate: U2 - War (cd62302d-c3aa-439b-b4f3-1a891625581e)
Computing track assignment...
...done.
Success. Distance: 0.00
Evaluating 1 candidates.
Sending event: before_choose_candidate
Tagging:
    U2 - War
URL:
    https://musicbrainz.org/release/cd62302d-c3aa-439b-b4f3-1a891625581e
(Similarity: 100.0%) (CD, 1983, US, Island, 90067-2, Target design, made in West Germany by Polygram)
[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort? Sending event: import
Sending event: cli_exit

Setup

OS: MacOS
Python version: 3.10.11
beets version: 1.6.1 (latest from github master)
Turning off plugins made problem go away (yes/no): No

My configuration (output of beet config) is:

directory: /Users/moore/Music/Beets-Test

import:
    write: no
    copy: yes
    move: no
    timid: yes
    reflink: auto
    resume: ask
    incremental: yes
    log: /Users/moore/Music/log/importer.log

ui:
    color: no

match:
    preferred:
        countries: [US]
        media: [CD]
        original_year: yes

As you intuited, the likely explanation here is that the correct match is too far down the list in the set of responses that MusicBrainz returns. Even with filters like preferred countries, we need to do that on the client side—that is, we need to fetch the top N albums and then sort them by these criteria; we can't do that before truncating the list to N.

You can adjust the number of results we request from MusicBrainz from the default 5: https://beets.readthedocs.io/en/stable/reference/config.html#searchlimit

Doing this makes searches take longer but is more likely to find the specific release you want, for release groups with large numbers of variants.

Adrian,

Thanks for pointing out the 'searchlimit' config parameter. By setting that higher, beets was in fact able to find and properly suggest the correct album release. The downside as you point out, is much longer search times before importer prompts user for action.

On my computer it takes about 1.5 seconds for each candidate, which adds up when I set searchlimit high. Until playing with beets and MusicBrainz, I had no idea my albums had so many variants!

I started investigating why this search takes so long. My first thought was that calls to MusicBrainz API must be the problem or rate limited. I tested out some calls to musicbrainzngs.search_releases(artist=artist, release=album, limit=<n>) and I find that it takes essentially the same time to return 25 releases as it does to return 5 (less than a second). So I don't think the API is the problem. The delay seems to be that beets incrementally evaluates every release, even after it finds a perfect match. Is there a reason why beets needs to keep evaluating after it finds a match?

As a user, I do not know a priori how many candidates I need to tell beets to evaluate before a good match is found. Wouldn't it make more sense for beets to just download all candidates, loop through them evaluating the 'distance', and then stop when a 100% match is found? This would be fast when the match is early in the list and would guarantee that a match is eventually found. Only if the user is not satisfied with the first '100% match' would beets need to be instructed to continue to evaluate more of the candidates. @sampsyo your thoughts?

In general, there are two things that take time when beets is matching albums:

Fetching details for each album from MusicBrainz. The initial search result doesn't include track details, so we have to issue subsequent API requests for that here: https://github.com/beetbox/beets/blob/62859f4389715d87af36827b6042d21a82e91fdc/beets/autotag/mb.py#L630-L635
The actual computation to find the best match. This works by finding the best match mapping every track you have to every track on MusicBrainz, which amounts to the "assignment problem" or "bipartite matching problem", for which an efficient algorithm is O(n^3). So that can get slow when there are a lot of tracks.

The first thing is probably dominant in most cases, but you might try profiling in even more detail if you're curious! The computational side is theoretically avoidable if we wanted to eagerly discard matches, but it could be sort of complicated to rearrange things to make that work.

By changing my workflow, the extra delay with setting a high searchlimit becomes a non-issue. Instead of rip-import-rip-import, I rip a stack of CDs and then import all at once. By the time I've finished answering the first set of import questions, the second import has been evaluated and is ready for review. No user-noticeable delay after the first import, presumably due to the magic of the parallel import pipeline. Nice!

However, it can still be impossible to determine the correct release version from the beets UI, even with all variants loaded. Musicbrainz considers an album to be a different version if it differs in: media-type, country, year, label or barcode. During the import process, beets does not show the barcode so I can run into the situation where it is impossible to disambiguate. See this example with the Album "The Trinity Session", by "Cowboy Junkies":

cowboy-junkies-import

cowboy-junkies-MB

I was hoping we could add barcode to the formatted output during candidate selection, but doesn't look like it is included in the AlbumInfo class. I suppose it could be added since it is returned by the MB API.

For the task of importing full albums that are in the musicbrainz DB, it would seem to me that a single call to musicbrainzngs.search_releases(artist=artist, release=album) is all that is needed to return all the details that are needed to display and have the user correctly choose the proper match for an album in hand, assuming album title and artist are read from the input music file tags.

I'm wondering if a simpler import for this kind of use case (album in hand) would be possible as a plugin that then passes the correct MusicBrainz ID back to the normal importer to finish the task? Or maybe we just need to store and display barcode in the normal importer.

A possible solution would be an option when selecting the release to load more search results. This way you don't have to increase the search limit and make everything slower but can load more releases when needed.

Both of these (including more info in the disambiguation string and a "load more" option) seem reasonable. The first is actually already covered by #845. It probably wouldn't be too hard to implement, if you're interested.

I've coded the changes needed to get barcode from MusicBrainz and to display it with the other disambiguation info during import. Seems to work well for my use case. I will test a little more and then submit a PR shortly.

beetbox / beets

Importer fails to find perfect match until given MusicBrainz release ID #4882

Debug

Setup