apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.64k stars 319 forks source link

Integrate Camoufox into PlaywrightCrawler #684

Open Ehsan-U opened 1 week ago

Ehsan-U commented 1 week ago

crawlee-python is great but often lacks stealth against modern anti-bot measures when using vanilla playwright. Camoufox is fully compatible with playwright, only have to change browser initialization. It bypasses all currently available anti-bot measures. It would be a great opportunity to make crawlee-python playwright stealthy.

janbuchar commented 1 week ago

Hello @Ehsan-U, thank you for the suggestion, we will consider it!