gchudov / cuetools.net

CD image processing suite with optimized lossless encoders in C#
http://cue.tools/
Other
495 stars 52 forks source link

EAC plugin not finding all versions of CD in Musicbrainz database #47

Open caveguru opened 4 years ago

caveguru commented 4 years ago

When a CD is loaded in EAC and the cuetools plugin searches the Musicbrainz database it only finds a subset of the versions of the CD in the database. In the specific example a classical CD has been released many times both as a freestanding album and in various boxed set compilations, but only one of the freestanding versions is found. Other CDs from the same boxed set compilation are found in both their separately issued versions and the correct version from the boxed set. The example is for this collection: Deutsche Grammophon 477 8826

ha-korth commented 4 years ago

The plugin doesn't search MusicBrainz directly. The CUETools DataBase (CTDB) replicates and stores metadata from MusicBrainz (and the other databases). The plugin accesses the metadata from CTDB. The identifier (CTDB TOCID) used to lookup the CD is derived from the physical layout of the CD (number of tracks, start positions and lengths).

plugin search settings http://cue.tools/wiki/CTDB_EAC_Plugin#Usage

Your example seems to be: https://musicbrainz.org/release/6fb68252-0c38-4d5a-8c1a-b3ea211bd8d2 There are no physical layouts attached to this release https://musicbrainz.org/release/6fb68252-0c38-4d5a-8c1a-b3ea211bd8d2/discids for exact matches but the same CD(s) from alternate releases can have the same layout(s). There may also be possible fuzzy search matches. Without the CTDB TOCID(s), I can't lookup the possible metadata matches to the TOC layout of your CD(s) in CTDB. This could be CD1 (or not) http://db.cuetools.net/?tocid=KwcDxm.G8Frggt8Lm7ZbYUEIfwU- http://db.cuetools.net/cd/8150710 http://db.cuetools.net/lookup2.php?version=3&ctdb=1&metadata=extensive&fuzzy=1&toc=0:33304:78533:101823:151518:212957:249730:286980:325908 https://musicbrainz.org/cdtoc/attach?toc=1%208%20326058%20150%2033454%2078683%20101973%20151668%20213107%20249880%20287130

or perhaps this more common TOC layout http://db.cuetools.net/top.php?tocid=bExefIn6EIS2k7x6sH6JPE07n9o- http://db.cuetools.net/cd/654805 http://db.cuetools.net/lookup2.php?version=3&ctdb=1&metadata=extensive&fuzzy=1&toc=33:33183:78183:101393:150933:212133:248808:285933:324633 https://musicbrainz.org/cdtoc/attach?toc=1%208%20324783%20183%2033333%2078333%20101543%20151083%20212283%20248958%20286083

caveguru commented 4 years ago

OK, so a couple questions- the link you shared is the correct release and yes I see there are no disc ID's, and yet when I have been ripping other CD's from this same set (specifically discs 6 through 11 of 12), the correct information has been found. It was some of the earlier discs and disc 12 that definitely did not find the set. Why would some work and not others?
-also, I can't seem to find the TOCID identifier anywhere in the EAC app. Where can I look up which id is being found by the search? do I need to use the CUEtools app instead? the last two links you shared do appear to be CD1, and I might have found that one when I searched for it (there are many pressings of this disc), but I definitely didn't find disc 12. -also, how do I search the Cuetools database that you linked to so I can see if the CD is in there? There doesn't seem to be any option to force the information to come from a specific entry in the database. I realize the image match is critical to get the accuraterip benefit so maybe this isn't a viable idea. I guess I should use my CDs to update the Musicbrainz database via the Picard app so other people's searches can find the correct image?

ha-korth commented 4 years ago

Why would some work and not others?

Alternate releases can have the same TOC layout(s). The same CD can be released multiple times, as a single CD, as part of various multi-CD sets, or even licensed for release by a different label. Some may have the physical layouts attached to releases in MysicBrainz, others may not.

I can't seem to find the TOCID identifier anywhere in the EAC app. Where can I look up which id is being found by the search?

The CTDB TOCID isn't available in EAC until after the complete CD is ripped and can be found in the EAC extraction logfile

 [CTDB TOCID: ZYX_3CFBpbc1gORqnP6h9aPPlVY-] found
 Submit result: already submitted
 Track | CTDB Status
   1   | (607/847) Accurately ripped
   2   | (617/847) Accurately ripped
   3   | (628/847) Accurately ripped
   4   | (623/847) Accurately ripped
   5   | (605/847) Accurately ripped

how do I search the Cuetools database that you linked to so I can see if the CD is in there?

CTDB queries

http://db.cuetools.net/?tocid= http://db.cuetools.net/top.php?tocid= http://db.cuetools.net/?artist= http://db.cuetools.net/top.php?artist=

using top.php sorts by most entries, without top.php sort is by most recent so for the above I would use

http://db.cuetools.net/?tocid=ZYX_3CFBpbc1gORqnP6h9aPPlVY- http://db.cuetools.net/top.php?tocid=ZYX_3CFBpbc1gORqnP6h9aPPlVY- http://db.cuetools.net/?artist=Pink_Floyd http://db.cuetools.net/top.php?artist=Pink_Floyd)

the other links are found within the pages.

caveguru commented 4 years ago

Thanks for the answers but I'm not sure this solves my problem (yet). I think I need the CD's image to appear in the Musicbrainz database. I also don't understand- if that 12 CD set appears correctly for some of the CD's in the set shouldn't those disc ID's appear on the musicbrainz tab that you linked to? I guess I will re-rip one of those CD's to get the TOCID and trace it back to the MB entry since there could be two 12-disc entries in MB. I'll also rip disc 12 using the incorrect entry so I can get that TOCID and post it. What about me adding disc ID's via Picard- would that do anything to help me find a match later in the Cuetools plugin?

ha-korth commented 4 years ago

if that 12 CD set appears correctly for some of the CD's in the set shouldn't those disc ID's appear on the musicbrainz tab that you linked to?

If someone adds them and the edits/changes are approved.

What about me adding disc ID's via Picard- would that do anything to help me find a match later in the Cuetools plugin?

see above. The CTDB replicates MusicBrainz often so if the additions are approved the plugin would have access to them.

I guess I will re-rip one of those CD's to get the TOCID and trace it back to the MB entry since there could be two 12-disc entries in MB.

In CUERipper (part of CUETools), if you insert a CD and click the MusicBrainz icon at the bottom (before you rip the CD) it should send you to the MusicBrainz CD lookup page. This won't get you the CTDB TOCID but it should show you releases that have a version with the same layout. This is the result from the CD I used in the previous post when I click the icon:. https://musicbrainz.org/cdtoc/attach?toc=1%205%20199385%20150%2061117%2095117%20119447%20143157

I'll also rip disc 12 using the incorrect entry so I can get that TOCID and post it.

EAC setting to automatically write extraction log to hdd EAC > EAC Options > Tools tab > Automatically write status report after extraction

caveguru commented 4 years ago

OK, here's the original disc's TOCID: lOWhl0Ou.UEwK8psvzSBKeaDUr4-

and here's the TOC from the log:

 Track |   Start  |  Length  | Start sector | End sector 
---------------------------------------------------------
    1  |  0:00.00 | 16:35.00 |         0    |    74624   
    2  | 16:35.00 |  1:45.42 |     74625    |    82541   
    3  | 18:20.42 |  2:09.33 |     82542    |    92249   
    4  | 20:30.00 |  5:57.35 |     92250    |   119059   
    5  | 26:27.35 |  1:19.22 |    119060    |   125006   
    6  | 27:46.57 |  6:10.00 |    125007    |   152756   
    7  | 33:56.57 |  0:18.08 |    152757    |   154114   
    8  | 34:14.65 |  5:00.10 |    154115    |   176624   
    9  | 39:15.00 |  2:28.55 |    176625    |   187779   
   10  | 41:43.55 |  1:29.20 |    187780    |   194474   
   11  | 43:13.00 |  9:22.07 |    194475    |   236631   
   12  | 52:35.07 |  9:18.68 |    236632    |   278549   

I also found this other example. It's disc 5 of a 10 CD set. The first 4 discs have all been recognized correctly but this one was not even though there are 2 (!) Disc ID's for it in the database. I ripped it according to an ID found in freedb by the plugin but freedb didn't have the cover art so I would prefer that the plugin found the disc on MB. Curiously in the search window none of the entries from MB showed up. I think this also happened for the 12 of 12 disc above. TOCID: LxFW7HN4aYKXZAmaWkFvxQLJHIM-

And the Musicbrainz page showing the two ID for disc 5: https://musicbrainz.org/release/716e698f-f64d-4d7b-960b-994dc70e78b5/discids

And the TOC as ripped: Track | Start | Length | Start sector | End sector

    1  |  0:00.33 | 10:29.00 |        33    |    47207   
    2  | 10:29.33 | 14:26.00 |     47208    |   112157   
    3  | 24:55.33 |  8:07.00 |    112158    |   148682   
    4  | 33:02.33 |  5:47.00 |    148683    |   174707   
    5  | 38:49.33 |  4:24.00 |    174708    |   194507   
    6  | 43:13.33 |  6:59.50 |    194508    |   225982   
    7  | 50:13.08 |  4:09.25 |    225983    |   244682   
    8  | 54:22.33 |  4:59.00 |    244683    |   267107   

Is it possible that the TOC's aren't lining up do to variations from disc to disc? Clearly these are copies of the same master but possibly pressed in different years - the MB entry is for a set sold under the Phillips label, while the set I have is under the Decca label, but with the same cover art minus the record label info. I can't find the Decca entry in MB, but they are the same because Decca was formed when Phillips was bought out in 1999..

ha-korth commented 4 years ago

Is it possible that the TOC's aren't lining up do to variations from disc to disc? Clearly these are copies of the same master but possibly pressed in different years

Or different plants all over the world with a copy of the same master but using different software and equipment that may or may not place the start of the first track immediately following the 150 sector lead-in and an engineer that may decide to select a different sector as the best start position for each track or perhaps pad a little extra silence after the end of last track. When played start to finish the CDs would be the same. When Identified by track start positions, they are different.

In your second example this is your CD https://musicbrainz.org/cdtoc/attach?toc=1%208%20267258%20183%2047358%20112308%20148833%20174858%20194658%20226133%20244833 http://db.cuetools.net/cd/859709

The first track begins at sector 183 (33 sectors after the 150 sector lead-in as shown by the EAC TOC) and has a length of 267258 sectors (including lead-in). The 2 Disc ID's in your link start on sector 182 and have lengths of 267257 and 267407 sectors. All track start positions are the same but the last track has 150 extra sectors padded on one of them. Your CD is the same as the shorter one except every track starts 1 sector later.

Note: When calculating the CTDB TOCID the actual start position of track 1 isn't used. All CDs are calculated as if track 1 starts immediately following the lead-in so your CD and this one https://musicbrainz.org/cdtoc/oVFHF3JmkgS51T68vx_RfNpl5aY- http://db.cuetools.net/cd/648712 would fall under the same CTDB TOCID but have different MusicBrainz Disc IDs (which do use the actual start position of track 1).

Perhaps you should start an issue report for CTDB https://github.com/gchudov/db.cue.tools to at least include all variants (where track 1 starts on sector 0, 32, 33, 37 etc.) as MusicBrainz metadata for that CD.