apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.02k stars 254 forks source link

feat: Project templates #237

Closed janbuchar closed 3 months ago

janbuchar commented 3 months ago

to test this:

cookiecutter gh:apify/crawlee-python --checkout project-templates --directory templates/beautifulsoup

or even better (the --spec ... part won't be necessary after merge)

pipx run --spec git+https://github.com/apify/crawlee-python.git@project-templates crawlee create
janbuchar commented 3 months ago

This is still pretty barebones (see my comments), but I'd prefer to get the CLI to PyPI as quickly as possible and iterate upon that.