Closed Ehsan-U closed 1 month ago
I think it's pretty straightforward to expose the crawler.run
method using FastAPI, for instance. The following snippet is without any guarantees, but it shouldn't be too incorrect :smile:
from fastapi import FastAPI
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from typing import Any
crawler = PlaywrightCrawler()
@crawler.router.default_handler
async def handler(context: PlaywrightCrawlingContext) -> None:
pass
app = FastAPI()
@app.post("/crawl")
async def crawl(url: str) -> Any:
await crawler.run([url])
return await crawler.get_data()
Note that this has to be run as a FastAPI app, e.g. with fastapi run
.
It is possible that we will implement something even more streamlined in the future though, so feel free to throw ideas here.
Scrapy
offers an HTTP API through a third-party library calledScrapyRT
, which exposes an HTTP API for spiders. By sending a request toScrapyRT
with the spider name and URL, you receive the items collected by the spider from that URL.It would be great if
Crawlee
could provide similar functionality out of the box. Its like turning a website into realtime API.