apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.15k stars 286 forks source link

How to run the crawler? #487

Closed Ehsan-U closed 1 month ago

Ehsan-U commented 1 month ago

Is there a way to run the entire project instead of running individual files with python3 file.py, similar to how Scrapy uses scrapy crawl <spider>? Does Crawlee (Python) have a command like this? I couldn't find any reference in the Crawlee Python documentation on how to run the entire project.

Ehsan-U commented 1 month ago

Project Structure:

UP-SCRAPER/ │ ├── crawler/ │ ├── crawler/ │ │ ├── __init.py │ │ ├── __main.py │ │ └── routes.py │ ├── poetry.lock ├── pyproject.toml └── README.md