DIGITALCRIMINAL / ArchivedUltimaScraper

Scrape content from OnlyFans and Fansly
GNU General Public License v3.0
950 stars 39 forks source link

HTTP ERROR 429 #838

Open CannotTouch opened 1 year ago

CannotTouch commented 1 year ago

I think OF have changed something so now while scraping it stuck, if you try to load from browser you receive the HTTP ERROR 429 so it's a temporary ban for too many request (to avoid it just change your IP and the time is resetted and the script restart to work correctly).

How we can set it better to avoid it? (if it's possible insert some delay between requests)

CannotTouch commented 1 year ago

I'm still going in 429 with 5 max thread

wireman21 commented 1 year ago

setting max thread to 1 works for me although really slow

CannotTouch commented 1 year ago

suggest: probably it should use metadata to don't scrape older stuff if the scrape it was yet done before or at least don't go everytime till the start.

CannotTouch commented 1 year ago

a lil update on some test: I have done a fresh install with the commit f60d2614fa9553c96ba7c2b39c30d044e155903f and all works correctly with the max thread to 5. I receive the 429 error just when it scrape the list of subscription, probably it don't use the limit there. but resetting the ip it finish to scrape it and than i have scraped a model with over 2000 files without going in the 429 error.

god43 commented 1 year ago

well the rate limit seems to trigger on metadata scans, can we limit that to x threads.

CannotTouch commented 1 year ago

we are waiting that the owner will find the time and the way to solve it, at the moment for me seems quite randomly stuff... sometimes i receive the error, sometimes not. probably OF have set variables limits based on their traffics... i don't know...

datawhores commented 1 year ago

As long as you have the previous response it should be possible to avoid a lot of scraping I'm actually going to add this to my fork very soon. Just progressing through all the different post types

https://github.com/excludedBittern8/ofscraper

edit: caching may not be possible the downloads have a policy key, and it seems to change frequently possibly every day. Without it you can not download.

CannotTouch commented 1 year ago

@DIGITALCRIMINALS any news about a workaround? :p

DIGITALCRIMINAL commented 1 year ago

@DIGITALCRIMINALS any news about a workaround? :p

Sorry, yes I have fixed it on my end and I know what's wrong. It's to do with the script throwing exceptions in the network manager.

Every uncaught exception it throws takes up a semaphore/thread and causes it to hang (or loop) forever. I've already handled it on my end but the commit relies on another unfinished commit that changes the way downloads are handled.

I don't mind pushing the commit that handles the exception, but the script won't be able to report that a download failed.

CannotTouch commented 1 year ago

take your time, don't worries. I'm happy to know about this news. Thanks for support :)

DIGITALCRIMINAL commented 1 year ago

I probably fixed it in the latest commit. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L176

Script will detect 429 (rate limit) and automatically resolve itself by checking every 5 seconds. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L149

Personally I found that they only allow 1K requests per IP every 5 minutes. OF resets the rate limit every 5 minutes. You can still batch thousands of requests before the OF rate limiter kicks in.

betoalanis commented 1 year ago

I probably fixed it in the latest commit. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L176

Script will detect 429 (rate limit) and automatically resolve itself by checking every 5 seconds. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L149

Personally I found that they only allow 1K requests per IP every 5 minutes. OF resets the rate limit every 5 minutes. You can still batch thousands of requests before the OF rate limiter kicks in.

session_manager.py is not included in the latest UltimaScraper commit, does it get added when udpating? or should we manually add the UltimaScraperAPI?

right now my setup is working, slow but I think good enough so I wanna be sure and not mess something up by replacIing files I shouldn't

avekifes commented 1 year ago

I'm currently processing a user with over 3,300 posts, and the script has been downloading at over 150 Mbps for over 24 hours and is still going at it...so the rate limiting definitely seems to be solved with the latest commit, although I'm not sure if that runtime is normal.

DIGITALCRIMINAL commented 1 year ago

Depends on how many threads you've set it at

CannotTouch commented 1 year ago

I probably fixed it in the latest commit. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L176

Script will detect 429 (rate limit) and automatically resolve itself by checking every 5 seconds. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L149

Personally I found that they only allow 1K requests per IP every 5 minutes. OF resets the rate limit every 5 minutes. You can still batch thousands of requests before the OF rate limiter kicks in.

@DIGITALCRIMINALS thanks for the fix but at the moment i cannot test it becasue sadly i'm stumbled upon another error https://github.com/DIGITALCRIMINALS/UltimaScraper/issues/953

betoalanis commented 1 year ago

I probably fixed it in the latest commit. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L176 Script will detect 429 (rate limit) and automatically resolve itself by checking every 5 seconds. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L149 Personally I found that they only allow 1K requests per IP every 5 minutes. OF resets the rate limit every 5 minutes. You can still batch thousands of requests before the OF rate limiter kicks in.

session_manager.py is not included in the latest UltimaScraper commit, does it get added when updating? or should we manually add the UltimaScraperAPI?

@DIGITALCRIMINALS

CannotTouch commented 1 year ago

I probably fixed it in the latest commit. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L176 Script will detect 429 (rate limit) and automatically resolve itself by checking every 5 seconds. https://github.com/DIGITALCRIMINALS/UltimaScraperAPI/blob/93b7fd08ab7153e583cfa1c5ae50aab7878c8dab/ultima_scraper_api/managers/session_manager.py#L149 Personally I found that they only allow 1K requests per IP every 5 minutes. OF resets the rate limit every 5 minutes. You can still batch thousands of requests before the OF rate limiter kicks in.

session_manager.py is not included in the latest UltimaScraper commit, does it get added when updating? or should we manually add the UltimaScraperAPI?

@DIGITALCRIMINALS

It isn't in the same directory but is downloaded, just run the update command to keep it updated at the latest version.