fiso64 / slsk-batchdl

A batch downloader for Soulseek
GNU General Public License v3.0
216 stars 16 forks source link

Files are not skipped during consecutive runs #51

Closed Rufusnu closed 1 week ago

Rufusnu commented 2 months ago

The software downloads similar files (differing sometimes only with one character in name, but same format and size) even when using --skip-existing image

This is the config Screenshot 2024-08-18 140049

--user "" --pass "" --path "/.../sldl/downloads/" --name-format "{title( - )artist|filename}" --m3u "all" --youtube-key "" --get-deleted --format "flac,wav,aiff,mp3,m4a" --pref-format "flac,wav,aiff" --min-bitrate 320 --pref-min-bitrate 1411 --min-samplerate 48 --min-bitdepth 16 --pref-min-bitdepth 24 --skip-existing --skip-mode "m3u" --music-dir "/.../sldl/downloads/" --remove-brackets --desperate --yt-dlp --yt-dlp-argument "--config-location /.../yt-dlp/yt-dlp.conf" --concurrent-downloads 8 --display-mode "simple" --strict-conditions

Command ran: sldl "https://music.youtube.com/playlist?list=PLmKyWcf1T8nzZ53Vy33rqQiZ8NcN4xAda&si=1Z2xYY2xcM2qgXGy" --config ./sldl.conf

*sorry if it is an issue of me using the software wrong!

fiso64 commented 2 months ago

Does it create an m3u file in the output directory?

If so, please send its contents

Rufusnu commented 2 months ago

Yes, it does. After running the program 3 times on the same playlist, it downloaded the file in 3 different versions, but in the m3u file there was always only 1 appearance showing as successful and not failed image

Also, this happened on a clean run (output directories empty) image Instead of the normal output, there are a lot of empty lines. Maybe it helps

If more info is needed, please tell

fiso64 commented 2 months ago

Yeah seems like it's broken for some reason. I don't feel like debugging this though because I'm currently working on an update where the m3u editing and skip existing logic will be rewritten anyways. Sorry for that.

If you absolutely need track skipping to work you could export the youtube playlist to a csv file and then run sldl without --skip-existing but with --remove-from-source. This should be much more reliable.

Rufusnu commented 2 months ago

Thats also good news! Thanks for the advice and best of luck with implementing the rest of it!

fiso64 commented 2 months ago

I'll keep it open, it still needs to be fixed

fiso64 commented 1 month ago

This is fixed after the latest commits. You can wait for the updated binaries or build it now yourself

delpetra commented 1 week ago

Apologies, but I think I'm still having this issue. My input is a csv, so I know I could circumvent by using --remove-from-csv but I plan to use this to download multiple different csvs and retain the output as different m3us (so as to recreate playlists in another music library, without generating duplicate files).

Example output m3u8 is below - you can see it has correctly identified the first track is already downloaded in the 'test' folder, but it hasn't identified the following one for some reason (and subsequent runs of the program still cannot identify this song in particular). A couple of tracks I've downloaded so far seem to have this problem but I can't work out what's in common. I thought it was the brackets at first, but this doesn't seem to be the issue.

sldl test7.csv --user X --pass X --pref-format mp3 --pref-length-tol 10 --pref-min-bitrate 319 --pref-max-bitrate 321 --pref-max-samplerate 48000 --pref-strict-title true --pref-strict-album true --pref-accept-no-length false --input-type csv --m3u all --skip-existing --skip-mode name --music-dir "C:\Users\Dan\OneDrive\Music\DJ\Utilities\slsk-batchdl"

#SLDL:C:\Users\Dan\OneDrive\Music\DJ\Utilities\slsk-batchdl\test\Paul Johnson - Get Get Down.mp3,Paul Johnson,Get Get Down - EP,Get Get Down,6,0,3,0;./08.  Draxx (ITA)  -  Get Ur Freak On (Extended Mix).mp3,Draxx (ITA),Get Ur Freak On - Single,Get Ur Freak On (Extended Mix),5,0,1,0;"./Rosalie, James Mac, VALL - The Boy Is Mine feat. Rosalie (Club Mix).mp3",James Mac & Vall,The Boy Is Mine (feat. Rosalie) - Single,The Boy Is Mine (feat. Rosalie),3,0,1,0;

C:\Users\Dan\OneDrive\Music\DJ\Utilities\slsk-batchdl\test\Paul Johnson - Get Get Down.mp3
08.  Draxx (ITA)  -  Get Ur Freak On (Extended Mix).mp3
Rosalie, James Mac, VALL - The Boy Is Mine feat. Rosalie (Club Mix).mp3
fiso64 commented 1 week ago

Can you post the csv file, or the lines that are necessary to reproduce this?

The m3u looks fine to me, the first track seems to be in the test folder while the subsequent tracks are probably in the test7 folder. The ./ in the path ./08. Draxx (ITA) - Get Ur Freak On (Extended Mix).mp3 tells it that it's located in the output folder, which should be test7 since that's the name of the csv file.
Is this correct?

delpetra commented 1 week ago

It succesfully redownloads the file into the new folder (test3) but is unable to identify the file if it has already been downloaded, leading to duplicates.

In this case, the Draxx track was already in a folder 'test2' but was re-downloaded into the 'test3' folder. Ignore the fact my code snippet says test7 - it should say test3, to match this csv

test3.csv

fiso64 commented 1 week ago

I just realized that you're using --skip-mode name (different from op who used m3u which should work now) which isn't really recommended anymore unless you want to skip preexisting files that weren't downloaded by sldl. If you use name skipping then the m3u index will not be read at all to skip files, and will only be written to with whatever the name skip mode finds.

I realize now that it's probably better to have the index feature and playlist generate feature as independent cli arguments. That way you would be able to create a separate m3u for every csv while keeping track of all downloaded tracks in a global index file.

I'll try to figure out why name skip-existing doesn't work later today though.

delpetra commented 1 week ago

Apologies - yes, I should have clarified. I’m deliberately using ‘skip-mode name’ so that I can reference a library of existing tracks I don’t want to redownload. Agreed that separately generating playlist and index file would be incredibly useful and is exactly how I was hoping to use the tool! I imagine, then, one could manually update the index based on an existing library prior to using the tool for the first time, enabling you to use the more reliable ‘skip-mode m3u’.

fiso64 commented 1 week ago

Okay, turns out there was a little issue that is now fixed, but the third track is still not getting skipped. The reason is that in the csv, the artist is James Mac & Vall whereas the downloaded file has them separated by commas (at least in my case). It's one of the many special cases that this mode does not handle.

What you could do for now is pass a name-format argument, e.g --name-format "{sartist} - {stitle}", to set the file names to the values as they appear in the csv. That way when comparing the csv fields against the file names, it should always find a match (probably).

fiso64 commented 1 week ago

New version adds separate options for playlist and index creation: https://github.com/fiso64/slsk-batchdl/releases/tag/v2.3.1

delpetra commented 1 week ago

Awesome, thank you! I can see a separate index and m3u are now created when you add --write-playlist, however I don't see a 'universal' index file that builds iteratively on previous ones (i.e. expands as you run subsequent commands to show the full extent of your downloaded library). I'm not sure if that was the intended behaviour or I am missing something in the new configs! (I can't seem to see anything in --help to explain)

fiso64 commented 1 week ago

You can create a universal index by setting the --index-path argument to some fixed path

delpetra commented 4 days ago

Thank you! Would it be possible to manually edit the index / library file to 'teach' it about files already downloaded, not from this tool? I can see manually editing the text could achieve this, but I can't work out the pattern of the four numbers separated by commas.

(I am assuming here that this would be more reliable going forwards than pointing to a 'skip-music-dir' and relying on filenames)

This manual editing would also be useful, e.g. to adjust file extensions where you have converted a song after downloading it.

fiso64 commented 4 days ago

The format is documented here https://github.com/fiso64/slsk-batchdl/blob/e74dc4beda4947fafdfa39cbab2b68c11b95b76e/slsk-batchdl/M3uEditor.cs#L50-L52

and the meaning of every number can be found here. If any of the string fields (filepath, artist, etc) contain a comma, then the entire field should be wrapped in quotes ". If a quoted field itself has any quotes, then they should be duplicated.

You may not need it though. I think (not 100% sure) that if you give it a skip dir once and run it then the found file paths will be automatically written to the index.

delpetra commented 2 days ago

You are right - the found files get written to the index. Brilliant!

Another question (apologies) - is there still a way to create a log of the files not found on Soulseek?

fiso64 commented 2 days ago

No problem. You have a couple of options:

  1. All downloads (failed or not) are logged in the index, so you could parse that, filtering the appropriate failure reason (NoSuitableFileFound = 3)
  2. Failed downloads are also written to the m3u playlists as comments (along with the failure reason as a string), which is more readable. These lines always start with #FAIL
  3. You could use --on-complete together with a simple script that takes the track metadata info and failure reason as arguments, checks whether the failure reason is NoSuitableFileFound, and writes the metadata to a text file if so.
  4. You could join the csv files into one long file, then run sldl tracks.csv --skip-not-found --print tracks. This will print tracks that weren't found during prior runs in a separate section.