BuilderIO / gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL
https://www.builder.io/blog/custom-gpt
ISC License
18.14k stars 1.88k forks source link

WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Request blocked - received 429 status code. #137

Open Voyager3D opened 5 months ago

Voyager3D commented 5 months ago

I'm no coder and i've not scraped websites before. But i'm assuming that this error code might be the website denying me scraping it too much?

I was able to output a file from this website after it scanned 150 pages. Worked perfectly, but somewhere after 150 it does not seem to like it and i get this error: WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Request blocked - received 429 status code.

Not sure if im on the ball with that one or not, but any advice would be appreciated!

Cheers!

Cougart commented 5 months ago

Hi, I'm having the same issue with several websites. Is it possible to add a sleep option between two calls? I don't see any other possibilities. Thanks a lot!

SimonGodefroid commented 4 months ago

429 being "the too many requests" status code you may have been throttled by the server.

Meaning: to prevent people from making too many requests they block requests coming from a given IP either temporarily or permanently after a given amount of incoming requests. Not saying this is 100% your case but that's the most probable scenario here.