MinisculeGirraffe / tdl

Fast, Concurrent, Rust based Tidal-Media-Downloader implementation.
MIT License
58 stars 6 forks source link

Adding extra tokens: album artist #16

Closed poland153 closed 2 years ago

poland153 commented 2 years ago

Hi, been using tdl recenly and its been super fast and solid on ubuntu. Is there any chance you could add a new token for album artist? This is how I organize my music and it would be great if that exif data could be saved to each track and if I could also use it like {album_artist} so my folders organize nicer.

I also noticed that the year attribute isnt being saved to each track either, but that might be a different issue.

MinisculeGirraffe commented 2 years ago

Howdy bud. Actually yeah, that totally should be possible. Any of the existing tokens should be able to be re-used anywhere in the file path. Doing something like {album}_{artist} should work at any place in the naming string. Can you provide an example of what your naming scheme is?

The current way the program names files is less than desirable (IMO) and does need a rework though. To give some background on why it is the way it is currently.

Since the program is multi threaded and can download multiple track/albums/artists at the same time, there's no defined order that the downloads will actually come in under. If you're downloading an entire artist, each download that's queued from the program may be from an entirely different album. This is much faster as there's no delay between when an album is finished, and the next download item starts, since there's multiple API lookups we would have to wait for it to be able to queue up the next download(A problem that greatly annoyed me with the python script).

The Tidal API doesn't really return full information for the track/album/artist, when making a request on another resource. I.E if you request all the tracks in an album, you'll only get partial details about each track. Or if you make a request about a specific track you'll only get partial details about the album it's related to. The problem with this is when it comes time to decide where a track should go in the file system there's only a subset of information available to decide where that item should go in an absolute folder structure. Because of the above optimization, the context of the album is lost when actually processing the track, and severely limits the information available when deciding its file system path. To provide the best experience, you would really need to make three API calls, to lookup the track, album, and artist details. This is also incredibly slow, even if all 3 requests are waited for concurrently.

The solution for this, and part of the work that went into the last update of the program is to maintain a request cache in disk/memory. This way the program can have access to all the information anywhere in the program. All further API lookups can be pulled from either the disk/memory with only the first lookup actually incurring any significant time cost. From what i've seen in my testing API lookups have a cache control header of 5 minutes, which the request cache of TDL will respect. If you're downloading multiple albums from an artist you've got a 5 minute window of practically free API lookups.

So now, when it's time to generate an items file path, we can use this request cache to pull all the information possible without causing a slowdown, or increasing the amount of 401s when hitting the Tidal API. This should allow for a much more robust naming token system, without any of the performance overhead.

I should have some time over the next couple days to work on this, and hopefully put another release out by Monday.

poland153 commented 2 years ago

Thanks for the quick reply. My music is organized in foobar2000 by this album naming scheme: /{AlbumArtist}/{AlbumReleaseYear} - {AlbumName}/{DiscNumber}-{TrackNumber} {TrackName} Normally {Artist} works fine for an album if all the tracks are performed by them. But if I download an album that has a track that was performed by someone other than the main artist, those tracks will be moved outside of the main artist folder and into their own artist folder thus breaking up the album into multiple places.

Since I want all the tracks for an album in one place, {AlbumArtist} fixes that by adding that layer to say, even though you have other artists on multiple tracks, that doesnt matter. Theyre all performing for the main artist on this one album so therefore you all are superseded by the album artist/creator.

Thanks that was very informative, pretty good way to do multi downloads. Yeah only thing missing is holding onto that album data when your downloading tracks. A little off topic but I noticed that Tidal-Media-Downloader has a multi threaded option. Not sure how new that is, but interesting to see if they did things different.

MinisculeGirraffe commented 2 years ago

Gotcha. Yeah, that would fall under the situation I described. Secondary information that would need to be pulled from the API. Shouldn't be that heard to write in.

As for the tidal-media-downloader multithreaded option, i've looked through their code quite a bit and the implementations of concurrency are quite different between the two projects.

There's two kinds of optimizations that can be preformed in an I/O bound task like this.

The tidal-media-downloader implementation uses synchronous concurrency while tdl uses asynchronous concurrency. So for every single task, they have to spawn a new system thread. Using that many system threads is resource intensive both for CPU(due to context switching) and Memory. Generally your CPU can't ever do more things simultaneously than the available cores. So if you're spawning 10 threads on a dual core CPU, only 2 will ever actually be executing at the same time.Async code is where the big performance improvements are gained, since it allows you do other things while you're waiting for a web request to finish, without having to context switch to a different thread.

This implementation uses Tokio as an asynchronous runtime. Tokio will spawn as many system threads as there are CPU cores, and then schedule code execution on an available system thread. When a system thread isn't in use it will attempt to steal pending tasks from the other threads(work stealing). This pattern is called greenthreading, and lets us spawn thousands of process level "threads" with practically zero overhead.

The vast majority of the execution time in tdl(over 90%) is spent waiting on I/O, either from the network or elsewhere. Even a single threaded execution of tdl is going to be significantly faster due to the asynchronous code being able to perform other actions while we're waiting for an HTTP request to finish.

Not to mention that since Rust is a compiled language, the calculations are going to be inherently faster than python. In the benchmarks i've performed all of this has lead to over 100x less CPU time during executing the same tasks.

sn-guthub commented 2 years ago

For example, https://listen.tidal.com/album/221358486. I end up with tracks 1, 2, and 3 under artist Mats Gustafsson, and tracks 4-10 under artist Jim O'Rourke.

But looking at the json debug output, I don't know how you'd decide to put this album under Jim O'Rourke.

MinisculeGirraffe commented 2 years ago

I got some time this weekend re-wrote how the naming system gets the file paths which should fix these issues and add quite a few more options for file naming.

It's still a WIP, there's a couple more things I want to work out with it, but i'll be merging #21 probably later tonight or tomorrow.

MinisculeGirraffe commented 2 years ago

Closing this out. Authored v0.3.0 with the changes in place.

This release does include a pretty big change to how files are named, so I would recommend reviewing your configs before upgrading.

The default behavior is to now always prefer the album artist, and if for whatever reason if it's not available, then to fallback to the track artist.