Document and share discogs cases

JohnRichardWalker commented 3 years ago

Scott, I've been working on the same idea as you at the same time, and I'm raising an 'issue' so we can see if we might collaborate. As a minimum, we could share odd discogs listings that don't work with each others' code. My program copes with at least one discogs edge case - where a classical album is comprised of a work (which has a length) that breaks down into movements, which also have lengths. Your code, on the other hand produces the .xml file to populate the id3 tags - I'd not even thought to do that.

Two significant differences in my approach: I get the data in .json form, which in my experience has been easier to parse than the html, and I've written (so far) in C++, but could be persuaded to learn Python in a good cause.

ScottCh1 commented 3 years ago

Hello John (if I have that right),

It's good to see that someone has noticed my project! I had hoped that Audacity users would be enjoying it by now, but the response was pretty dismal in the places where I announced it. Total silence, crickets. Where did you find it listed, if you don't mind my asking?

I have written a substantial amount of C/C++ before, in fact it was the first half of my career. I went with Python for this project because of past experience with the BeautifulSoup library, which is a nearly effortless tool (if you work with Python, anyway) for web scraping. I was able to extract the track list and tag data much more easily than I initially expected, thanks to the copious documentation and substantial refinement that has gone into this module.

Have you given the Python script a test drive yet? It's worked on every album I've tried so far, out of close to a hundred. But as you noted, I had to work around the one listing I found that used sub-track referencing (or movements). There have been multiple versions of each album on discogs, and there's usually another tracklist page that lists the movements as individual tracks. I don't think that problem would be hard to solve if the need for it had been expressed. But as I said, scant response so far.

The most noteworthy challenge I had to address so far are the numerous albums posted to discogs.com with no track times. I created a workaround for that, you may have noticed it in my doc. I use a dumb estimation process to divide up the length of the recording (converted to seconds). Then I space the tracks evenly from the beginning to the end. This is far from ideal, and requires a fair amount of manual tweaking to get from there to "Export Multiple".

How are you handling such track lists?

I'm also interested in how you are retrieving JSON for the discogs.com album track lists. Are you using their web API? Or converting the HTML pages? I looked into their dev API, but I found there are so many track lists on Discogs for any given moderately popular album that it was easier to look for the right one manually with my web browser. It seemed nearly impossible to identify the right version of the record's tracklist through an automated querying process, even with all of the tag data available.

What complicates this problem even more is that for many popular albums there is a master or default listing. This would appear to be the go-to track list for each album, except that these often do not have track times (even if all of the other versions of the album have them).

The biggest limitation for the use of Python on this project seems to be the fact that python requires some effort to set up and run scripts for a fair percentage of the Audacity using public. Easy on Linux and other Unix variants, but requires work on Windows and MacOS. With a C program you at least have the option of downloading and installing a compiled executable, assuming that the providing site is trusted. Do you have a means of distribution arranged? Will it be distributed as source, e.g. github? If so, the same limitation for the user base exists.

It would be more ideal if this kind of program could be released as an integration for Audacity, as a plugin for example. But we don't have that option. Audacity already has a development API, which (as you may be aware) supports automatic creation of labels. Is that level of integration part of your project? There is no support yet for automating the inclusion of the metadata tags unfortunately. Even so, it seems tempting to create full automation that would allow the Audacity user to choose to query Discogs for the album and automatically label a completed recording. We have many of the components needed to perform that process already, but the remaining work would be substantial.

Regardless of my speculation, I'm not convinced that there's enough of a user base waiting for this work to be made available to be worth the investment in development time. What's your take? Do you have insight into the level of demand for the kind of software we're building?

Thanks,

Scott C.

@.***

Sent: Wednesday, May 05, 2021 at 1:37 PM From: "JohnRichardWalker" @.> To: "ScottCh1/vinyl-ripper-helper" @.> Cc: "Subscribed" @.***> Subject: [ScottCh1/vinyl-ripper-helper] Document and share discogs cases (#3)

Scott, I've been working on the same idea as you at the same time, and I'm raising an 'issue' so we can see if we might collaborate. As a minimum, we could share odd discogs listings that don't work with each others' code. My program copes with at least one discogs edge case - where a classical album is comprised of a work (which has a length) that breaks down into movements, which also have lengths. Your code, on the other hand produces the .xml file to populate the id3 tags - I'd not even thought to do that.

Two significant differences in my approach: I get the data in .json form, which in my experience has been easier to parse than the html, and I've written (so far) in C++, but could be persuaded to learn Python in a good cause.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

JohnRichardWalker commented 3 years ago

Scott Thanks for the reply, I will give a detailed response in the next 24 hours or so. I plan to put my own code on to github in the next day or two. There is another project at https://github.com/MartinBarker/vinyl2digital which I will looking at. It's possible to get the discogs .json by doing curl https://api.discogs.com/releases/. I used a json library from Niels Lohmann - serves the same sort of purpose as your Beautiful Soup does for Python, but of course it's not so easy to master. John Walker

ScottCh1 / vinyl-ripper-helper

Document and share discogs cases #3