Marekkon5 / onetagger

Music tagger for Windows, MacOS and Linux with Beatport, Discogs, Musicbrainz, Spotify, Traxsource and many other platforms support.
https://onetagger.github.io/
GNU General Public License v3.0
629 stars 34 forks source link

Feature Request: Chained platform sources with different tags extracted [via ISRC] #260

Open eejd opened 1 year ago

eejd commented 1 year ago

Summary: When the files you would like to Auto Tag have a known source (e.g. Beatport, iTunes purchase, etc.) and you can extract the primary metadata from the correct platform, it should be possible to then use the canonical identifier (i.e. ISRC) to retrieve additional tags from other platforms (e.g. Mood, Sub-genre, etc.) and Audio Features. Ideally, this would be viewed as a single Auto Tag sequence, where there is a primary (or multiple primary) platforms selected which are used sequentially to get core metadata and ISRC and then secondary platforms that are searched for additional information. Then any primary platform would be used to determine 'success' or 'failure' and the secondaries are just used to get the additional data—and therefor are never failures, even if the track is not included.

Details: I often know the source of my files or have a canonical source I would like to use. For example, I have online playlists on streaming services I use to identify new material and evaluate if they are interesting to add to my DJ catalog. I also often get tracks from platforms I know as the source. Ideally, I would like to use 1T in these cases to get the full metadata and ISRC canonicalized by an Auto Tag run, but also get any and all extra metadata available. Currently, even if I know that a song comes from a platform (e.g. Beatport), I cannot select it as a 'primary' source—but still get extra tags from additional sources. My current experimental workflow is to have a configuration for a few primary sources, where I get all tags, then a secondary configuration that I run after to get Mood, extra Genres, sub-genre, etc. In general, it seems that there might be other workflows that users might like to implement where you can chain one or more platforms for primary searches and one or more platforms for secondary searches. Specifically, this should speed up large batch searches—especially if the source of the files is largely related to (or like to be found) on one or two platforms, were the remaining searches can be ISRC based and only collected a few tags.

For me, a related problem is interpreting the 'success' and 'failure' counts, as these are not aggregated or managed differently depending on if the platform is being used to get supplemental data or for primary identification. I am currently using 1T on small batches of incoming tracks, but I would like to go back and re-tag my entire collection to ensure complete tags and maximal metadata... which I couldn't see doing without some form of a feature like this.

Possible implementations:

  1. A simple solution would be to group platforms into 'primary' and 'secondary' (allowing the same platform in both cases, potentially) where the primary is 'matching' and ensuring an ISRC and then the secondary is for additional tags. (So, in this case, the tags, overwrite, etc. would be also settable separately for 'primary' and 'secondary'.
  2. A more complex approach (not sure if it would be useful) would be a tree search strategy where you could define primary, fallback platforms or additional metadata platforms.
  3. Simply allow the tagging update options to depend on having found a first match (with ISRC) and then every remaining platform becomes a secondary.
Marekkon5 commented 1 year ago

Hello, I don't know if I understood you fully however:

Let me know if that's what you meant or if I misinterpreted this. Thanks

eejd commented 1 year ago

Thanks, that is what I (basically) understood about the process. Let me explain a bit further. I think part of what I have come to realize in the last months of playing (and tagging and retagging and re-retagging 😂) is there are different workflows I can use depending on how clearly I know the song's source and current metadata. However, my larger catalog is complicated by past attempts to match with Beets, Picard, etc. I realized that for recent material, getting all metadata matched up when I know the source and getting the ISRC is easy. So I hoped that I could then only pull Mood, Genres, Sub-genre from the additional sources--where the title or other metadata may be slightly different. I've been using pretty strict matching so that when I have the workflow sorted, I could re-process my entire collection... and I don't want to have to manually check every song. I'll then plan to QuickTag music being organized when I prep sets or when I have time to listen to tracks to update.

I think what I was hoping to be able to see easily was the number of true failures (i.e. cannot match the track) vs failures to get Genre, Mood, etc from secondary sources. I care a lot about the first, about the second not much.

One thing that would be helpful would be some more documentation on the matching process, what the strictness measures mean, and which sources can use ISRC as a primary. I see in the code that some use it as default, then fall back to matching. I guess one option for my use case would be to be able to specify only ISRC--as you can with track source I think for some platforms. As second would be the ability to identify success/failure with the track rather than platform. I care about knowing if I got the track identified or not. Then (less) I care if I couldn't get genre data--I will manually put in at least a single placeholder genre in that case on the failed. But for the first case, I want to ensure all match somewhere. In the second, I'd use the (true) genre failures m3u list to tag with a basic genre. Let me know if this makes sense. I guess the ISRC seemed like the most stable and reliable track identification from what I've seen. So in this mental model, being able to pull label, remix artist, or anything I can from any source after getting the track/ISRC is similar too—-but conceptually different--from matching to identify the track in the first place. (Similarly, I assume, getting Spotify Acoustic Features would be something I would do based on having already found the ISRC.)

In general, some more developer friendly info would be nice! (Especially as I'd like to contribute asI learn more about the code base! I'm not a rust developer but have plenty of programming experience and hope to be helpful soon.) :-)

Marekkon5 commented 1 year ago

Hello,

In the latest commit I've fixed the failed list and made statuses show for all platforms. You can get the binary from Actions tab to test it out.

You can also hover on the :exclamation: or :heavy_check_mark: to like preview the error / status (ie. accuracy / reason to fail). You can always check the log for more info or detailed progress / reporting.

To the ISRC - currently there are only 2 platforms that support ISRC: Spotify and Beatport. Because of that I didn't make it anything special / forced / option, just a almost invisible matchrate improvement. You can see in the logs if there was match by ISRC. I've also updated their descriptions to mention that. As for the strictness/accuracy: Basically ISRC and exact (cleaned) title & artist matches are 100% accuracy, and anything below is fuzzy matched. So if you set strictness to 100% you will only get ISRC and exact matches.

As for the developer info:

I hope this helps. Thanks