apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.64k stars 319 forks source link

How to enqueue_links without selector #728

Open robert-elles opened 7 hours ago

robert-elles commented 7 hours ago

How can i queue a link where I already have the link extracted from the html. I don't need the selector. I already have a relative link that i just want to enque. In the docs there are only examples where one defines a selector and crawlee extracts the links.

vdusek commented 6 hours ago

Hi, if you have the link extracted and you want to just enqueue it, you can use the add_requests method.

It should be accessible from all context:

context.add_requests(['https://crawlee.dev/'])

and crawler types:

crawler.add_requests(['https://crawlee.dev/'])