Closed sampsyo closed 10 years ago
Hey, can I take this up? A newbie to the project so it seems this ma be something I can tackle.
@sampsyo: yeah, it makes complete sense. Regarding what constitutes a null field: I know there was some discussion about it viz. None
vs. empty strings, etc. (well, that's certainly a whole lot of Latin abbreviations!) Anyway, the point is, how best to check for these?
@varunagrawal: feel free to tackle it if you want. The relevant functions would be _group
and _duplicates
.
Cool. I'm guessing the most predictable way to classify null values is to special-case only None and the empty string—other falsey values (including False) should probably not be considered nulls.
@varunagrawal Absolutely! Please ask questions if you need help getting familiar with beets internals. Or just open a pull request when you're ready.
Yes, In my library, I have 434 of the same duplicate with the default settings :(
@ian-kelling: did you delete a comment or something? I got an email with more information:
I have 6554 tracks, duplicates plugin says 6553 are duplicates. I do have a few hundred with no musicbrainz info, but that is a bit ridiculous.
But you now say you have 434 dupes. Which is it? Anyway, I'll need a few things:
beet config
beet info
on two of the tracks you get reported as duplicates.If I can see that information, I can see what's going wrong and fix it or help you fix it.
Any change those few hundred tracks with no musicbrainz metadata are the duplicated ones? If so, then we should have a fix very shortly.
Sorry for the delay. I will be responsive if you want me to test anything. The issue still happening with the latest sources. mar 11: d091c7e5b4ddc2cccd311796b0e676e2b2187193
when I initially wrote 6553 duplicates, I was doing something wrong, which I quickly realized and changed.
$ beet config directory: /i/music
import: log: /a/dt/beetlog.log move: yes quiet_fallback: skip
match: strong_rec_thresh: 0.07 library: /a/bin/data/musiclibrary.blb
plugins: discogs duplicates web discogs: source_weight: 0.5 duplicates: album: no full: no format: '' keys: [mb_trackid, mb_albumid] move: no tag: no path: no copy: no count: no checksum: delete: no web: host: '' port: 8337
edit: grabbing the info plugin and output
$ beet info 01\ Spring\ -\ Concerto\ #1\ in\ E\ major\ -\ 1\ -\ Allegro.flac /i/music/Vivaldi/Four Seasons (1960 EMI 1988)/01 Spring - Concerto #1 in E major - 1 - Allegro.flac title: Spring - Concerto #1 in E major - 1 - Allegro artist: Vivaldi artist_sort: artist_credit: album: Four Seasons (1960 EMI 1988) albumartist: albumartist_sort: albumartist_credit: genre: Classical composer: grouping: year: 1988 month: 0 day: 0 track: 1 tracktotal: 0 disc: 0 disctotal: 0 lyrics: comments: bpm: 107 comp: False mb_trackid: mb_albumid: mb_artistid: mb_albumartistid: albumtype: label: acoustid_fingerprint: acoustid_id: mb_releasegroupid: asin: catalognum: script: language: country: albumstatus: media: albumdisambig: disctitle: encoder: rg_track_gain: -1.76 rg_track_peak: 0.791321 rg_album_gain: -0.59 rg_album_peak: 0.871582 original_year: 0 original_month: 0 original_day: 0 length: 197.466666667 bitrate: 716341 format: FLAC samplerate: 44100 bitdepth: 16 channels: 2 album art: False
$ beet info 02\ Spring\ -\ Concerto\ #1\ in\ E\ major\ -\ 2\ -\ Largo.flac /i/music/Vivaldi/Four Seasons (1960 EMI 1988)/02 Spring - Concerto #1 in E major - 2 - Largo.flac title: Spring - Concerto #1 in E major - 2 - Largo artist: Vivaldi artist_sort: artist_credit: album: Four Seasons (1960 EMI 1988) albumartist: albumartist_sort: albumartist_credit: genre: Classical composer: grouping: year: 1988 month: 0 day: 0 track: 2 tracktotal: 0 disc: 0 disctotal: 0 lyrics: comments: bpm: 1 comp: False mb_trackid: mb_albumid: mb_artistid: mb_albumartistid: albumtype: label: acoustid_fingerprint: acoustid_id: mb_releasegroupid: asin: catalognum: script: language: country: albumstatus: media: albumdisambig: disctitle: encoder: rg_track_gain: 8.78 rg_track_peak: 0.217041 rg_album_gain: -0.59 rg_album_peak: 0.871582 original_year: 0 original_month: 0 original_day: 0 length: 165.066666667 bitrate: 598113 format: FLAC samplerate: 44100 bitdepth: 16 channels: 2 album art: False
@ian-kelling: right, so that makes sense, we still haven't pushed the empty field fix. @varunagrawal wanted to tackle it, so we've been on hold for the moment. I'll wait another day, and then push a fix myself if necessary. In the meanwhile, you can either (1) grab mb_trackid
s for those 400-something tracks you have or (2) change the configuration option keys
(or pass -k
on the command-line) to a list of other keys that should be unique. You could do, for example, -k artist album title
, or for a more accurate metric, -C 'ffmpeg -i {file} -f crc -'
, which will decode the audio portion from each track and checksum it.
Thanks. No rush on my account, I've gotten the duplicates in my collection figured out :)
Fixed with e8f6781.
The
duplicates
plugin's grouping does not currently attempt to detect null fields. This results in duplicates being reported when several tracks (or albums) are missing metadata. For the default key set, for example, items that don't have album or track IDs set all appear to be duplicates. This is explainable but it's confusing output for the uninitiated.Perhaps the plugin should not print out objects in the "null" grouping—those for which all the key fields are empty. Would that make sense to everybody?