Implement browser per proxy to PlaywrightCrawler

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

https://crawlee.dev/python/

Apache License 2.0

4.64k stars 319 forks source link

Implement browser per proxy to PlaywrightCrawler #720

Open vdusek opened 22 hours ago

vdusek commented 22 hours ago

Implement browser per proxy to PlaywrightCrawler in a similar way as it is in the Crawlee JS.
https://crawlee.dev/api/browser-pool/interface/LaunchContextOptions#browserPerProxy
Before implementation sync with @barjin, as he can provide further context and also suggest potential improvements (mostly in context with the session pool).