apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev
Apache License 2.0
15.51k stars 665 forks source link

Improve Session Management guide. #796

Open mnmkng opened 4 years ago

mnmkng commented 4 years ago

The session management guide shows how to set up SessionPool, but does not really tell the user how to use it for common use cases, like using a single IP until it breaks.

There's a mention about IP rotation in Proxy management, but it's not detailed enough.

doaortu commented 3 months ago

please do this. I don't know how to bring my own cookies to the crawlers session management.

Is it through createSessionFunction option? or I need to set the cookie at requestHandler each time?

mnmkng commented 3 months ago

@souravjain540 probably worth looking into with 11 upvotes