apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.17k stars 291 forks source link

Selenium crawler #284

Open codejunction opened 3 months ago

codejunction commented 3 months ago

The library is sick

It would be an beautiful addon if we can add an selenium crawler.

Connecting to selenium webdriver or remote drivers

janbuchar commented 3 months ago

Hi, thanks for the kind words! Would you mind elaborating why you prefer Selenium over playwright, for instance? It is kinda tricky to work with selenium in an async context, but feasible. So I guess we could put it on our roadmap if there is enough interest from the community 🙂