Downloader is failing, due to recent Rate Limiting update by Fansly

cinfulsinamon commented 1 year ago

Bug Description

For some creators I try to download from, the program fails to recognize posts after the first set it finds. It will also even fail to find the first set of posts if a previous successful run happened recently. It doesn't seem to happen with all creators or with downloads using the download_mode = Single param, however.

Expected behavior

All Timeline posts from a creator should download.

Actual behavior

Only the first set of posts found were downloaded.

Environment Information

Operating System: Windows, Linux, MacOS
Fansly Downloader Version: 0.4.1 and above
Fansly Downloader Type: Executable & python version
Specific creators name: All creators

Additional context

I had issues with the Windows executable so I tried the latest python script to see if it was solved, and it was not. Adding some debug lines to print the request output shows that the first request is successful and the timeline_cursor is correctly updated to the last entry, but the second request still returned with all fields present but empty. Adding an extra delay between each request seems to fix the issue.

Avnsx commented 1 year ago

Thanks for creating this issue ticket, I was just looking into this, as someone has just yesterday also mentioned this in https://github.com/Avnsx/fansly-downloader/discussions/127#discussioncomment-6797157

Seems like fansly has introduced some kind of rate limiting into their backend just recently, already figured out how to fix it and will adjust for the change sometime within the next days, in version 0.4.2

Avnsx commented 1 year ago

A fix for this has been temporarily deployed to executable versions of fansly downloader in version 0.4.1-post1. This is because version 0.4.2 of fansly downloader is not packagable with pyinstaller at the moment (even though it's already downloadable as raw python code version on github) and I'm in a hurry to head out for vacation so I can't fix and re-package it. Catch y'all in a month 👋

Sebastian1175 commented 1 year ago

I tried the executable and the most recent 0.4.2 python version of fansly downloader and It is still having the same rate-limiting problem as before.

Avnsx commented 1 year ago

So it appears just after I initially bypassed the first introduction of rate-limiting, by switching back to the old fansly api endpoint for timeline downloads, they've also noticed it and adjusted their website code to apply the rate-limiting onto it too. This change happend just a few hours after I released fansly-downloaders 0.4.1-post1 version, which makes me think that they're now actively looking up the commit history of this downloader and are counter patching my changes 🤣

Anyways, so can you guys try out this branch of version 0.4.2 and see if it solves the rate-limiting issue again? Within that branch, fansly-downloader is just artifically slowed down to avoid hitting the rate-limit. I'm on a vacation for a few weeks, chilling on the beach, so I don't have access to a python environment (or a PC) and I won't make greater deeds to change that.

Additionally I noticed they're introducing more variables / tokens for each request to the api endpoints, to further validate the requests, which their backend has to handle. If they've already added logging, to see which requests are not sending these new tokens, they're already at this point in time, able to tell, which requests came from 3rd party code like fansly-downloader (as in version 0.4.2 these tokens are still not replicated). It's also very possible that maybe only if these tokens are not sent the rate-limiting is applied, because last time I checked when scrolling around on their website it still instantly loads all media content which means that there's no rate limiting applied, it would require further testing, which I don't currently have the time for.

lordoffools commented 1 year ago

Strangely, I don't always hit this rate-limit issue.

Sometimes it goes all the way, and sometimes I get this:

WARNING | 12:29 || Low amount of Pictures scraped. Creators total Pictures: 1683 | Downloaded: 300 WARNING | 12:29 || Low amount of Videos scraped. Creators total Videos: 873 | Downloaded: 113

Sometimes it downloads only 10 items, and sometimes it downloads thousands.

Is it possible to slow down even further on our side (by allowing param level rate-limiting)?

lordoffools commented 1 year ago

If it helps, I am using the forked (0.4.2) version.

lordoffools commented 1 year ago

Another data point, after the failure I noted above, I tried a different creator, and it's been scraping for a while now (successfully). We'll see what the final count is when it's done. I will update back once it's complete.

Update: The new run (for a different creator) ended successfully:

Finished Normal type, download of 2911 pictures & 461 videos! Declined duplicates: 30

So, I'm puzzled as to why some creator scrapes are throttled, and others are not (especially when those that aren't sometimes have way more content).

Avnsx commented 1 year ago

I am using the forked (0.4.2) version.

Can you try out this branch and let me know if it succesfully reliably passes the rate-limit all the time?

lordoffools commented 1 year ago

Can you try out this branch

Done. Tested multiple times.

It does not successfully pass the rate-limit all the time. At least, there are some creators where it fails all the time.

There are some where it passes 100% of the time.

I'm not entirely sure why.

Avnsx commented 1 year ago

It does not successfully pass the rate-limit all the time. At least, there are some creators where it fails all the time.

Looks like the most efficient way to handle this would be a function that before starting timeline downloads, meassures if a rate-limit is even existing for a specific creator and depending on the result dynamically adjusts the wait time. Would be cool if someone contributed that, else I'll write it myself when I return from my vacation in a few weeks.

But for now you might aswell just higher this sleep timer from 5, 6 to whenever it reliably passes the wait timer all the time with e.g. 7, 8

lordoffools commented 1 year ago

Thanks for the tip! I'll play around with the sleep timer and report back on my findings.

Sebastian1175 commented 1 year ago

Could you solve the problem? I can't tell reading your comments.

lordoffools commented 1 year ago

Could you solve the problem? I can't tell reading your comments.

The author says they will work on it when they return, and are also putting out a call for help for contributors to help with solving it and writing code.

plywood234 commented 1 year ago

I set my sleep timer to 105,108 and it started working on an account that previously did not scrape much. It probably doesn't need to be that crazy but it's definitely an issue with the sleep timer.

Edit: 72,75 worked but 52,55 did not work.

lordoffools commented 1 year ago

I did something similar. I have it set to 120, 240 right now, and it's working on all the ones I shared above that failed previously (and consistently).

Obviously taking forever. And not each one required 120.. So I'm not sure why some do and some don't.

Sebastian1175 commented 1 year ago

I set my sleep timer to 105,108 and it started working on an account that previously did not scrape much. It probably doesn't need to be that crazy but it's definitely an issue with the sleep timer.

Edit: 72,75 worked but 52,55 did not work.

It is slow as hell, but yes it works ok-ish with 72,75. Thank you

lordoffools commented 1 year ago

I finally encountered a creator that I cannot scrape with 120, 122. Doubling the numbers now to see if that helps (and yes, it'll take ages and ages).

lordoffools commented 1 year ago

Confirmed: I have an example of a creator where no matter how high I set the delay, it still fails.

LastInvoker commented 1 year ago

Confirmed: I have an example of a creator where no matter how high I set the delay, it still fails.

same here, i can only scrap till jan.2023 everything older is failed.

lordoffools commented 1 year ago

Confirmed: I have an example of a creator where no matter how high I set the delay, it still fails.

same here, i can only scrap till jan.2023 everything older is failed.

It has nothing to do with the age of the posts, it seems. I've had some that don't pull before August 2023, and some that don't pull before yesterday... and then some that pull 100% successfully. This is repeatable, so it's creator specific, it seems.

Very confusing to me.

LastInvoker commented 1 year ago

i always recieve the error that there are no media on current cursor, i dont know what to change anymore XD

Bearded-Baguette commented 1 year ago

I think I created a workaround for the rate limiting. I used the sleep function created above and added retry attempts after each sleep. If the program fails to pull posts from a timeline, it will wait X seconds then try to pull the same timeline. It seems to take 5-8 attempts, but it can take more sometimes. After it successfully pulls posts from the timeline, the number of retry attempts resets.

I created a pull request with these changes, but I'm not sure what the process is for reviewing those changes. It's definitely not a perfect fix, but it seems to push through the rate limiting most of the time.

LastInvoker commented 1 year ago

I think I created a workaround for the rate limiting.

can you please upload the part where you made this changes?

Bearded-Baguette commented 1 year ago

can you please upload the part where you made this changes?

Sure thing, I think you can check it out on my branch here. This is my first time trying to fork a branch in Github so please let me know if you can't get to it. There's also a pull request with the changes I made.

As a side note, I played around with increasing the number of attempts and the timer. 20 attempts at 5-20 second intervals is slow, but it was able to go through a page with content going back to late 2021 in a few hours.

Avnsx commented 1 year ago

Looks like the most efficient way to handle this would be a function that before starting timeline downloads, meassures if a rate-limit is even existing for a specific creator and depending on the result dynamically adjusts the wait time.

After reading the stuff you guys said, I need to correct myself. Considering some of you need 70+ seconds wait timers, it would be more benefitial to just replicate whatever the fansly website is doing, as it obviously allows instantly scrolling around and loading media and as I was pointing out before, I've seen some newly introduced identifier/auth tokens the timeline requests have now, if I had to take a wild guess, they prolly introduced the requirement for a javascript backend which within a real browser creates those tokens for each timeline request before they're being sent and this way the rate limiting is just entirely not applied. Just replicating that with python and specific 3rd party libraries will most likely get rid of the need of having to wait so long for each request.

Fansly devs, if you read this; I would be down to just keep static 5 second timers inbetween each request, but everything above that forces me to a propper replication, which will in return load up your servers with requests again. Down for a gentlemans agreement, that will work for both sides? Keep in mind even if I ceased service of this tool, someone else will re-create it (in-fact there's already multiple people that actively maintain scrapers for fansly); so even for you guys it would be profitable to just stick with this. It's a average case of don't blame the player, blame the game 🫣

melithine commented 10 months ago

I think I created a workaround for the rate limiting. I used the sleep function created above and added retry attempts after each sleep. If the program fails to pull posts from a timeline, it will wait X seconds then try to pull the same timeline. It seems to take 5-8 attempts, but it can take more sometimes. After it successfully pulls posts from the timeline, the number of retry attempts resets.

What about doing an incremental backoff timer based on the retry attempts? Ie, assuming the initial value is 1s for attempt 1, then use 2s for attempt 2, 4s for attempt 3, 8s for attempt 4, etc.? If someone set it to 5s, then it would go 5s/10s/20s/40s/etc.

Avnsx / fansly-downloader