DIGITALCRIMINAL / ArchivedUltimaScraper

Scrape content from OnlyFans and Fansly
GNU General Public License v3.0
944 stars 39 forks source link

HTTP ERROR 429 #838

Open CannotTouch opened 1 year ago

CannotTouch commented 1 year ago

I think OF have changed something so now while scraping it stuck, if you try to load from browser you receive the HTTP ERROR 429 so it's a temporary ban for too many request (to avoid it just change your IP and the time is resetted and the script restart to work correctly).

How we can set it better to avoid it? (if it's possible insert some delay between requests)

maxcom99 commented 1 year ago

I also all of a sudden cannot login.

avekifes commented 1 year ago

I'm sad to say that this makes the script unusable, at least for people with a ton of subscriptions. Mainly because the script insists on processing every single account, even when messages are set to false.

Adding a parameter that lets the user input a delay is the most obvious solution, but a full scrape might take forever...

foxdude42 commented 1 year ago

I think a good long term fix would be partial scraping plus limiting the number of concurrent requests with a semaphore. Currently, every post is re-scraped, even after being processed and recorded in the database. The post list is already sorted chronologically in descending order so I think it's safe to finishing scraping a subscription as soon as the last previously encountered post is seen, assuming all posts in the metabase db have been downloaded.

DIGITALCRIMINAL commented 1 year ago

Does the max threads settings not work???

kccomikeyga commented 1 year ago

This sounds like a server side change. Reconnecting to a different VPN location or refreshing your ISP IP is a temporary workaround.

CannotTouch commented 1 year ago

Does the max threads settings not work???

for now i still didn't find a correct number for it... do you have any suggestion?

avekifes commented 1 year ago

I've only had time to test a value of 5, and it seems to help, although I'm not sure how far it can really go because of a story highlight error I keep running into.

foxdude42 commented 1 year ago

Does the max threads settings not work???

Setting it to 2 or 3 prevents the 429 rate limit, but it takes forever to complete.

CannotTouch commented 1 year ago

10 is too much (i was starting trying from 100 lol)... 5 seems a good one to avoid the block

GameCharmer commented 1 year ago

Max threads of 1 won't help if you have a fast system and connection. It'd probably be worthwhile to check for a 429 response, wait X amount of seconds, then try again, increasing that wait on every failed iteration. Once the 429 clears, back to normal.

A fixed delay between actions would be helpful in single threaded cases.

avekifes commented 1 year ago

I don't know how true that is, but I did try 3, and I still got throttled....

GameCharmer commented 1 year ago

I don't know how true that is, but I did try 3, and I still got throttled....

I have it set to 1, wait a day between attempts, and have yet to get anything to download since the rate limiting started

icemouton commented 1 year ago

I can confirm the same thing happening here, even my browser is 429 for a while after running the script. Nothing downloads, set it to 1 thread for fun, didn't help.

avekifes commented 1 year ago

Why is the script even trying to process what seems like every account you've ever subscribed to regardless of what's picked? I feel like a lot of the existing problems could be avoided if it respected the choice...

headphonia commented 1 year ago

Even setting it to 1 thread isn't working for me anymore.

kccomikeyga commented 1 year ago

Keeping it set at one and being super patient works for me. I also have written an autohotkey script to input a digit then press enter, then wait 20 minutes, then repeat with one digit higher.

fg4jerem commented 1 year ago

I have a 15 mbps connection and I've faced this blocking a few times myself. It's been a while since I was last able to get a full successful run.

icemouton commented 1 year ago

I have a 3gbps connection and have been unable to scrape anything for weeks now I'm at 1 thread.. I'll look into ways of rate limiting the process itself I guess... anyone have any ideas?

avekifes commented 1 year ago

Keeping it set at one and being super patient works for me. I also have written an autohotkey script to input a digit then press enter, then wait 20 minutes, then repeat with one digit higher.

...how would that even work when the script goes through every account (read: way longer than 20 minutes) when you select a single model to scrape?

kccomikeyga commented 1 year ago

Messages=“false” so it doesn’t scrape paid content.

On Fri, Mar 3, 2023 at 11:53 AM avekifes @.***> wrote:

Keeping it set at one and being super patient works for me. I also have written an autohotkey script to input a digit then press enter, then wait 20 minutes, then repeat with one digit higher.

...how would that even work when the script goes through every account (read: way longer than 20 minutes) when you select a single model to scrape?

— Reply to this email directly, view it on GitHub https://github.com/DIGITALCRIMINALS/OnlyFans/issues/838#issuecomment-1453816347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWVQK4SUGD3DOG5BYUAEVU3W2IO2LANCNFSM6AAAAAAUT3NXRE . You are receiving this because you commented.Message ID: @.***>

CannotTouch commented 1 year ago

something is changed again today... it is going always in 429 very quickly

DIGITALCRIMINAL commented 1 year ago

I'm rewriting the network function. Going to route network requests through one class.

god43 commented 1 year ago

I'm rewriting the download function

thanks.

misterscraper commented 1 year ago

Anyone able to scrape anything last few days?

Nostang3 commented 1 year ago

I've been able to download but I've been doing single accounts and closing the window out once it grabs the new items before it goes into it's scrape all messages phase.

JohnnyTowns94 commented 1 year ago

I've been scraping models one at a time as I've needed to. Haven't been timed out as quickly yet.

Nostang3 commented 1 year ago

I've noticed that I'm not able to download one model in particular: lilsummerhoe. This one always ends up in in retries no matter how long I wait. I can also cancel the window and start with another model and it will work fine.

misterscraper commented 1 year ago

I've been scraping models one at a time as I've needed to. Haven't been timed out as quickly yet.

when i chose 1 by 1 it ends up scarping all models, then just says complete when nothing has been downloaded

betoalanis commented 1 year ago

Anyone able to scrape anything last few days?

yes, i was able to make it work with max threads=5

although right now I only have 35 or so subscriptions, so I don't know if it helps if you have more than that.

misterscraper commented 1 year ago

Anyone able to scrape anything last few days?

yes, i was able to make it work with max threads=5

although right now I only have 35 or so subscriptions, so I don't know if it helps if you have more than that.

I'm only getting the pink scrape lines and then it says complete, nothing downloads apart from the avatars/headers. Set my threads to 5 also and under 25 subscriptions

DIGITALCRIMINAL commented 1 year ago

:skull: I joined the team earlier, but I got back access within 2 minutes :skull: I actually managed to remove the rate limiting manually. image

Also this current commit doesn't include the network limiter, next commit will.

misterscraper commented 1 year ago

💀 I joined the team earlier, but I got back access within 2 minutes 💀 I actually managed to remove the rate limiting manually. image

Also this current commit doesn't include the network limiter, next commit will.

How'd you remove rate limiting?

DIGITALCRIMINAL commented 1 year ago

How'd you remove rate limiting?

Never mind, I thought I did but I just get unbanned waiting for 5 minutes.

Anyway I added in the rate limiter few hours ago.

I'll resolve the 429 error automatically in a future commit.

misterscraper commented 1 year ago

How'd you remove rate limiting?

Never mind, I thought I did but I just get unbanned waiting for 5 minutes.

Anyway I added in the rate limiter few hours ago.

  • Script can do around 1k requests in a minute before you reach 429 error.
  • You can also batch 10k+ requests (most I tried) and process them without semaphores but you'll get the 429 error once they're all finished.

I'll resolve the 429 error automatically in a future commit.

I'm not able to download anything, When I the script to go through all my 25 subscriptions..it does the processing bit (all pink lines) then completes. Only downloads avatars / headers.

DIGITALCRIMINAL commented 1 year ago

How'd you remove rate limiting?

Never mind, I thought I did but I just get unbanned waiting for 5 minutes.

Anyway I added in the rate limiter few hours ago.

  • Script can do around 1k requests in a minute before you reach 429 error.
  • You can also batch 10k+ requests (most I tried) and process them without semaphores but you'll get the 429 error once they're all finished.

I'll resolve the 429 error automatically in a future commit.

I'm not able to download anything, When I the script to go through all my 25 subscriptions..it does the processing bit (all pink lines) then completes. Only downloads avatars / headers.

That's not the latest commit then because I disabled the ability to download avatars and headers months ago

misterscraper commented 1 year ago
Screenshot 2023-03-13 at 12 29 28

this not the correct one?

DIGITALCRIMINAL commented 1 year ago
Screenshot 2023-03-13 at 12 29 28

this not the correct one?

Yeah, but it's preferred if you use the updater.py to update the script instead since it'll update all the packages.

I even deleted the profile scraper here

https://github.com/DIGITALCRIMINALS/OnlyFans/blob/d52abd3b41961c03a1dccb105769f2ac7ba8d325/ultima_scraper/modules/module_streamliner.py#L208

misterscraper commented 1 year ago
Screenshot 2023-03-13 at 12 29 28

this not the correct one?

Yeah, but it's preferred if you use the updater.py to update the script instead since it'll update all the packages.

I'm using the latest commit but nothing downloads, gives me the pink progress lines and says it's complete yet nothing has downloaded?

DIGITALCRIMINAL commented 1 year ago

Are you using a vpn/proxy

misterscraper commented 1 year ago

Are you using a vpn/proxy

Yes

DIGITALCRIMINAL commented 1 year ago

Ahh okay, if it's a dynamic vpn, it won't work with the script since OnlyFans only allows downloads from the original IP address.

Basically it goes:

IP 1 - > Get Post - > IP 2 Download Media = Download Fails

I need to readd the ability to get the post if there's an Unauthorised Status Code and also map proxies to post.

It should work if you disable it or use a private IP.

misterscraper commented 1 year ago

Ahh okay, if it's a dynamic vpn, it won't work with the script since OnlyFans only allows downloads from the original IP address.

Basically it goes:

IP 1 - > Get Post - > IP 2 Download Media = Download Fails

I need to readd the ability to get the post if there's an Unauthorised Status Code and also map proxies to post.

It should work if you disable it or use a private IP.

I've turned off my VPN, it gets so far and seems to have got stuck...still don't see any progress with downloading anything....

Screenshot 2023-03-13 at 15 41 02
betoalanis commented 1 year ago

That's not the latest commit then because I disabled the ability to download avatars and headers months ago

wait what? I've been using the updater.py constantly and I'm still getting an avatar and header folder with the downloaded files in them. how can I verify that I'm on the latest commit after using the udpater? or do I need to delete all and reinstall?

DIGITALCRIMINAL commented 1 year ago

Strange then, I would suggest deleting everything except the user_data folder and settings and running it again

betoalanis commented 1 year ago

Strange then, I would suggest deleting everything except the user_data folder and settings and running it again

oh wait, yeah I think I see the difference, I recently migrated computer and I downloaded the zip file but it was like a few days before the "2023 migration" so I skipped that. I noticed because you mention a user_data folder and I don't have that, so I'll just try it right away. Thanks! :D

misterscraper commented 1 year ago

@DIGITALCRIMINALS would you still recommend max threads at 5?

DIGITALCRIMINAL commented 1 year ago

@DIGITALCRIMINALS would you still recommend max threads at 5?

Personally, I'm using 32 and it's working fine for me

misterscraper commented 1 year ago

@DIGITALCRIMINALS would you still recommend max threads at 5?

Personally, I'm using 32 and it's working fine for me

I've got mine set to 5, but for some reason it stalls on the same model...

misterscraper commented 1 year ago
SCR-20230314-jmqt
misterscraper commented 1 year ago

When i go to the onlyfans site i get this:

SCR-20230314-jpve

and thats with 32 max threads...