Closed dosoe closed 4 years ago
Hmm; is it really just big releases that don't include work relationships? Can you link to an example where the works are included and one where they're not?
It can be instructive, when investigating MB behaviors like this, to take these and load the actual API response in a browser. Here's an API lookup for one of those albums, for instance: https://musicbrainz.org/ws/2/release/9c5c043e-bc69-4edb-81a4-1aaf9c81e6dc?inc=recordings+work-rels
This can let you quickly experiment with different releases and inc
values to see what appears.
You might also be interested in reading the MusicBrainz API docs, which describe what relationships are legal for which entities: https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2#Lookups
Please see if you can narrow down what's going on with MusicBrainz itself by talking directly to the API! If it seems to be doing something "wrong," we can file a bug with the MB folks.
I don't think we should make separate MB requests for every recording by default.
Indeed when I check https://musicbrainz.org/ws/2/release/9c5c043e-bc69-4edb-81a4-1aaf9c81e6dc?inc=media+recordings+release-groups+labels+artist-credits+aliases+recording-level-rels+work-rels+work-level-rels+artist-rels which is the equivalent of musicbrainzngs.get_release_by_id(albumid,RELEASE_INCLUDES)
(with the same RELEASE_INCLUDES
it does not show any work relationships:
<track id="19fbfd3b-92b6-3c1b-b7a6-703048a128a7">
<position>1</position>
<number>1</number>
<title>Goldberg-Variationen, BWV 988: Aria</title>
<length>113893</length>
<artist-credit>
<name-credit>
<artist id="24f1766e-9635-4d58-a4d4-9413f9f98a4c">
<name>Johann Sebastian Bach</name>
<sort-name>Bach, Johann Sebastian</sort-name>
<disambiguation>German Baroque period composer & musician</disambiguation>
<alias-list count="40">
<alias sort-name="Bach" type="Search hint" type-id="1937e404-b981-3cb7-8151-4c86ebfc8d8e">Bach</alias>
</alias-list>
</artist>
</name-credit>
</artist-credit>
<recording id="d57d7065-020f-4648-b1ca-12c9ba72f78d">
<title>Goldberg Variations, BWV 988: Aria</title>
<length>113786</length>
<artist-credit>
<name-credit>
<artist id="7002bf88-1269-4965-a772-4ba1e7a91eaa">
<name>Glenn Gould</name>
<sort-name>Gould, Glenn</sort-name>
<disambiguation>pianist</disambiguation>
<alias-list count="3">
<alias type="Search hint" type-id="1937e404-b981-3cb7-8151-4c86ebfc8d8e" sort-name="1)Glenn Gould">1)Glenn Gould</alias>
</alias-list>
</artist>
</name-credit>
</artist-credit>
<alias-list count="4">
<alias sort-name="Aria from Goldberg Variations BWV 988 (1955 recording) - Johann Sebastian Bach">Aria from Goldberg Variations BWV 988 (1955 recording) - Johann Sebastian Bach</alias>
</alias-list>
</recording>
</track>
As can be seen, the artists and their aliases are there, but that's pretty much all. Also, I would have expected recording dates to be there as well as instruments (Gould as pianist).
If we now look at https://musicbrainz.org/ws/2/release/db49c56b-7e11-4cbc-8fcc-577a031e8cd6?inc=media+recordings+release-groups+labels+artist-credits+aliases+recording-level-rels+work-rels+work-level-rels+artist-rels which is a release that contains exactly the same recordings (it's pretty much the first medium of the release above). There, the first track is much more detailed:
<track id="73e69279-d8c2-3a26-89ae-dc67535be2ee">
<position>1</position>
<number>A1</number>
<length>112693</length>
<artist-credit>
<name-credit>
<artist id="24f1766e-9635-4d58-a4d4-9413f9f98a4c">
<name>Johann Sebastian Bach</name>
<sort-name>Bach, Johann Sebastian</sort-name>
<disambiguation>German Baroque period composer & musician</disambiguation>
<alias-list count="40">
<alias sort-name="Bach" type-id="894afba6-2816-3c24-8072-eadb66bd04bc" type="Artist name">Bach</alias>
</alias-list>
</artist>
</name-credit>
</artist-credit>
<recording id="d57d7065-020f-4648-b1ca-12c9ba72f78d">
<title>Goldberg Variations, BWV 988: Aria</title>
<length>113786</length>
<artist-credit>
<name-credit>
<artist id="7002bf88-1269-4965-a772-4ba1e7a91eaa">
<name>Glenn Gould</name>
<sort-name>Gould, Glenn</sort-name>
<disambiguation>pianist</disambiguation>
<alias-list count="3">
<alias type-id="1937e404-b981-3cb7-8151-4c86ebfc8d8e" type="Search hint" sort-name="1)Glenn Gould">1)Glenn Gould</alias>
</alias-list>
</artist>
</name-credit>
</artist-credit>
<alias-list count="4">
<alias sort-name="Aria from Goldberg Variations BWV 988 (1955 recording) - Johann Sebastian Bach">Aria from Goldberg Variations BWV 988 (1955 recording) - Johann Sebastian Bach</alias>
</alias-list>
<relation-list target-type="artist">
<relation type-id="5c0ceac3-feb4-41f0-868d-dc06f6e27fc0" type="producer">
<target>64078387-5ff3-43d1-b203-38f98ef74c24</target>
<direction>backward</direction>
<artist id="64078387-5ff3-43d1-b203-38f98ef74c24">
<name>Howard H. Scott</name>
<sort-name>Scott, Howard H.</sort-name>
<disambiguation>classical music producer</disambiguation>
</artist>
</relation>
<relation type="instrument" type-id="59054b12-01ac-43ee-a618-285fd397e461">
<target>7002bf88-1269-4965-a772-4ba1e7a91eaa</target>
<direction>backward</direction>
<begin>1955-06-10</begin>
<end>1955-06-16</end>
<ended>true</ended>
<attribute-list>
<attribute type-id="b3eac5f9-7859-4416-ac39-7154e2e8d348">piano</attribute>
</attribute-list>
<artist id="7002bf88-1269-4965-a772-4ba1e7a91eaa">
<name>Glenn Gould</name>
<sort-name>Gould, Glenn</sort-name>
<disambiguation>pianist</disambiguation>
</artist>
</relation>
</relation-list>
<relation-list target-type="work">
<relation type-id="a3005666-a872-32c3-ad06-98af558e99b0" type="performance">
<target>6934e59b-e82c-3050-b0cf-70907db1f1a3</target>
<begin>1955-06-10</begin>
<end>1955-06-16</end>
<ended>true</ended>
<work id="6934e59b-e82c-3050-b0cf-70907db1f1a3">
<title>Goldberg-Variationen, BWV 988: Aria</title>
<language>zxx</language>
<language-list>
<language>zxx</language>
</language-list>
<relation-list target-type="artist">
<relation type-id="d59d99ea-23d4-4a80-b066-edca32ee158f" type="composer">
<target>24f1766e-9635-4d58-a4d4-9413f9f98a4c</target>
<direction>backward</direction>
<artist id="24f1766e-9635-4d58-a4d4-9413f9f98a4c">
<name>Johann Sebastian Bach</name>
<sort-name>Bach, Johann Sebastian</sort-name>
<disambiguation>German Baroque period composer & musician</disambiguation>
</artist>
</relation>
</relation-list>
<relation-list target-type="work">
<relation type-id="ca8d3642-ce5f-49f8-91f2-125d72524e6a" type="parts">
<target>1d51e560-2a59-4e97-8943-13052b6adc03</target>
<ordering-key>1</ordering-key>
<direction>backward</direction>
<work id="1d51e560-2a59-4e97-8943-13052b6adc03">
<title>Goldberg-Variationen, BWV 988</title>
</work>
</relation>
<relation type-id="51975ed8-bbfa-486b-9f28-5947f4370299" type="arrangement">
<target>8b683c5c-74d7-4be4-9157-0b706f2f904a</target>
<work id="8b683c5c-74d7-4be4-9157-0b706f2f904a">
<title>Goldberg-Variationen, BWV 988: Aria</title>
<disambiguation>catch-all for arrangements</disambiguation>
</work>
</relation>
</relation-list>
</work>
</relation>
</relation-list>
</recording>
</track>
As we can see, it contains the work title and other relations, composer, arrangements, performers and their instrument, producer, recording date etc. It is the same recording, we asked for the same information but get much more information for a smaller release.
So it seems that the problem is with musicbrainzngs.
@Freso , do you know where I could submit the corresponding bug report?
Wait, I'm not sure this is a problem in musicbrainzngs, which is the name of the Python library—perhaps you mean the MusicBrainz server? There are details about the MB bug tracker on the wiki: https://musicbrainz.org/doc/Bug_Tracker
Answer from MB:
We don't return relationships for releases with more than 500 recordings, because otherwise they would just time out and not return anything at all. The best alternative for this is probably to browse recordings by release in this case.
Should we implement a check and if the release has more than 500 tracks then get the data track by track?
Makes sense!
I don't think we can do that by default—fetching every recording for a 500-track album will take a very long time, and it will be wasted if the user doesn't need work information. Maybe it should be behind a configuration option? Or maybe it could be part of the responsibility of the parentwork
plugin, so the process comes off the "critical path" of the import process?
Also, I'm intrigued by this suggestion:
The best alternative for this is probably to browse recordings by release in this case.
Because this person didn't say "you have to fetch every recording individually," it suggests there may still be some way to fetch them all in bulk, by "browsing." Maybe that's worth looking into?
Problem
I implemented the
work
,mb_workid
andwork_disambig
tags not so long ago (https://github.com/beetbox/beets/pull/3272) . My problem is: for some recordings, the works just don't get fetched. It concerns especially the very big releases (20+ CDs like for example https://musicbrainz.org/release/9c5c043e-bc69-4edb-81a4-1aaf9c81e6dc or https://musicbrainz.org/release/9bcd75dd-995e-482b-8ba7-1ef074d253de ). I tried to backtrace the error (by putting random prints in thebeets/autotag/mb.py
and then runningbeet mbsync
on the problematic releases). What I can see is:while
RELEASE_INCLUDES
(inbeets/autotag/mb.py
) does contain'work-rels',
and'work-level-rels'
, for the releases I'm looking at,TRACK_INCLUDES
doesn't:musicbrainzngs.VALID_INCLUDES['recording']
contains'work-rels'
but not'work-level-rels'
, which is odd. At line 494 inbeets/autotag/mb.py
,musicbrainzngs.get_release_by_id(albumid,RELEASE_INCLUDES)
doesn't contain any works, even if the recordings do have works andRELEASE_INCLUDES
contains'work-rels',
and'work-level-rels'
. I first tried to checkmusicbrainzngs.get_recording_by_id(recording['id'], TRACK_INCLUDES)
for all the recordings and it turns out it doesn't contain any works, because of the first error. If now I add'work-rels'
toTRACK_INCLUDES
and then look atmusicbrainzngs.get_recording_by_id(recording['id'], TRACK_INCLUDES)
then it contains the works just fine.So I'm wondering: why do we get the work relationships with
musicbrainzngs.get_recording_by_id(recording['id'], TRACK_INCLUDES)
but not withmusicbrainzngs.get_release_by_id(albumid, RELEASE_INCLUDES)
even if both ask for'work-rels
andwork-level-rels
formusicbrainzngs.get_release_by_id
?A quick and dirty fix would be to ask for
musicbrainzngs.get_recording_by_id(recording['id'], TRACK_INCLUDES)
for each track. The problem is, there is a significant performance loss if we have one musicbrainz query for each track instead of each release, but I didn't look at it in too much detail. It seems to me thatmusicbrainzngs
doesn't send all info we ask for for very big releases, could that be because it is too big and they have a cap on the maximum size they can send?Of course, I checked: the concerned recordings do have works on MB.
Setup
My configuration (output of
beet config
) is: