JMPerez / spotify-dedup

Remove duplicates from your Spotify Playlists
https://spotify-dedup.com
MIT License
859 stars 74 forks source link

Dedup Large Number of Playlists #26

Open shelaffs opened 6 years ago

shelaffs commented 6 years ago

When processing a large number of playlists (counter starts at 100, I have 92 user playlists, +9 Spotify defaults) the search always seems to get stuck at 18 playlists remaining to process, even when allowing for a significant amount of time to pass (10+min - 1 hour).

The search does process some playlists but does not respond when "remove duplicates" button is pressed on those that are processed, but the duplicates do appear to get removed.

The issue has been present for at least 3 weeks.

JMPerez commented 6 years ago

Could you share a link to the playlist that gets stuck?

Also, if it does delete duplicates, does it mean it eventually doesn’t find duplicates if you run it several times?

JMPerez commented 6 years ago

Just for completeness, there are corner cases that are difficult to fix because they are hard to reproduce.

shelaffs commented 6 years ago

I understand. I'm not sure what playlist causes it to get stuck. I have several with 5,000+ songs and it's never been an issue (so far as I could tell) until a few weeks ago. I did test with deleting 4 playlists to put me under 100 total again, but it seems to persist in hanging at 18 playlists left to process. I have also tried rearranging the playlists. Perhaps it is indeed the large playlists or the number of large playlists that I have?

Typical results:

Starts at "still to process 100 playlists" but quickly (<2 seconds) goes down to 85 2018-10-22_14h19_59

~1 minute later displays "still to process 18 playlists" and hangs until close out of the web page 2018-10-22_14h21_04

Testing:

Tested by adding a single duplicate to one of the last (small) playlists in my list, and it did appear, but the "18 playlists" issue still displays: 2018-10-22_14h59_02

I noticed the button did respond in this case and the duplicate was removed, but there is still the "18 playlists" issue: 2018-10-22_15h03_50

I tested by adding a duplicate to one of my large playlists (>5,000 songs) and this did not get pulled by the dedup process, even after waiting for over 10 minutes. 2018-10-22_15h09_16

I added a duplicate to my saved songs as this has been reliable in the past, and it did pull that duplicate despite the "playlist" having over 6,000 tracks. 2018-10-22_15h17_49

If it's due to the size of the user playlists, I can accept that, but wanted you to be aware of the issue as I do not recall it being an issue until just the past few weeks, and I have had playlists this large for a while and never noticed it causing a problem (other than Spotify's 10,000 track limit).

I appreciate you looking into this!

shelaffs commented 6 years ago

Additional Testing: Browsers seem to handle the information differently as well.

Firefox (Windows 10) jumps from 100 playlists left to process to 85 to 18 and hangs there (my usual browser)

Chrome (Windows 10) shows each number as it checks the playlists from 100 and hangs at 21 left to process (both desktop and iphone SE mobile app)

Safari (iphone SE mobile) performs similarly to Chrome, and hangs at 21 as well

Thanks!

JMPerez commented 6 years ago

Wow, thank you so much for the detailed report, @shelaffs!

I tried finding out where the problem was by checking at Sentry, the error reporting tool I use, but I can't see any recent error there. I could try to reproduce the error if we knew what playlist was causing the error and then I created one with the same content, but for that we need to identify the faulty one. If you could access the network panel in the developer tools of your browser there should be an error that could help debugging this.

I think the best way to solve this would be to add additional error reporting and error handling and then try again. My guess is that there is an error fetching tracks from one of the playlists. Looking at the code there is no error handling in the calls to getTracks(), so if one of the requests for a page of tracks doesn't succeed the whole thing breaks.

I'll try to find some time to add more error handling that can display on the page that there was a problem with a certain playlist.

shelaffs commented 6 years ago

Okay I ran it in Firefox and Chrome. I don't know how to download the report but everything looked fine in Firefox except for a 500 error in the very beginning that didn't seem to affect much. It did run for almost 10 minutes and transferred 1gb of data which seems quite high but I'm not positive of the norm. 2018-10-23_14h07_03

I then ran it in Chrome and it transferred much less data and seemed to return a good number of errors 2018-10-23_14h24_27 They were all the same error it appears but for different playlists.

The playlists affected in order of the errors were: https://open.spotify.com/playlist/4cQ71zUu9D5MHa7eglNbSt https://open.spotify.com/playlist/1uwoG5mqwu12MIHMCLcvuj https://open.spotify.com/playlist/6zwMDTtbSOFZ1xeJq4o7ng https://open.spotify.com/playlist/5zHaGzuo56Yq3mug0I9o6F https://open.spotify.com/playlist/1QOJ8soBpYObekO37nhla7 (2 errors) https://open.spotify.com/playlist/6zwMDTtbSOFZ1xeJq4o7ng https://open.spotify.com/playlist/5RsQ3MP1tqT7Vo8vtb7Tt2 (2 errors) https://open.spotify.com/playlist/2ZXXiDodQdYu6tEkRxlJ4F https://open.spotify.com/playlist/1rKpwnFtmsZGBImJMiZzk0 (4 errors) https://open.spotify.com/playlist/3b3zqaFlRrjaw3ewD0cGXo (2 errors) https://open.spotify.com/playlist/1BgqsEh1vR85NWCwapobAE

These are all large playlists that I add tracks to almost daily, so I'm not at all surprised to see the errors coming from these. Is there a way to isolate the tracks that are returning the error?

The error appeared to be identical or very similar for each one that was returned 2018-10-23_14h35_40

Thank you!

JMPerez commented 5 years ago

I tried to find duplicates in https://open.spotify.com/playlist/4cQ71zUu9D5MHa7eglNbSt from my account and it found duplicated tracks without throwing any error.

I plan to wrok on error handling to notify when something goes wrong and make the tool "continue" if there is an eventual issue fetching the list of tracks for a playlist. This will take some time though so we shouldn't expect an easy fix.

shelaffs commented 5 years ago

I appreciate you looking into it. Just as one last test, I created a new playlist and dragged all tracks from the effected playlist with 4 errors (also the most tracks) and it did not get pulled for duplicates either, so I don't think recreating the playlists would help, but I was hopeful.

Thanks again for all your help, it's a fantastic tool =)

JMPerez commented 5 years ago

Thanks a lot to you! These conversations are what make me feel that these projects are useful and encourage me to continue working on them.

I’ll update the thread when I have any update. In the meantime I’m cleaning up some code and improving testing and error handling for part of the app.

shelaffs commented 5 years ago

2018-11-13_09h50_15

Hello, I just wanted to report that I tried running Spotify Dedup today (left it alone for a while) and it is once again pulling duplicates from my playlists and removing them. Thanks so much for the great work!

JMPerez commented 5 years ago

Thanks for letting me know @shelaffs! I've been improving the code a little bit to do better error handling but there are still a few things to do.