apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev
Apache License 2.0
15.39k stars 662 forks source link

RequestQueue should keep local cache of known requests to prevent unneeded `addRequest` calls #75

Closed mtrunkat closed 6 years ago

mtrunkat commented 6 years ago

Done