apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.02k stars 254 forks source link

It should be possible to instantiate `BasicCrawler` outside of an async context #236

Closed janbuchar closed 3 months ago

janbuchar commented 3 months ago
import asyncio
from crawlee.basic_crawler import BasicCrawler

crawler = BasicCrawler()

asyncio.run(crawler.run())

This currently fails, probably due to AutoscaledPool instantiation.