Closed Nezteb closed 6 months ago
Hard to say. I did not have a chance to explore these two tools. In some of my previous projects, phantom js was used for browser rendering, but now it seems to be a bit dead.
It would be interesting to see an example fetcher for Playwright or Puppeteer. Maybe we can add it to Crawly as a standard fetcher :) Just let me know how it goes!
As a non-Elixir example, I just built a scraper for sites that will save each page as a PDF using Playwright: https://github.com/Nezteb/scrape-pdf
Next weekend I'll see what I can do about a crawly
fetcher for it!
https://github.com/mechanical-orchard/playwright-elixir will probably be able to support what you are looking for.
Oh nice, I'll check that out! I'll see if I can get a minimal demo of using crawly
along with playwright-elixir
as the fetcher!
Currently
crawly
has an implementation for Splash: https://github.com/elixir-crawly/crawly/blob/5eeeb2a3ba230ee55d2411a64f9e426957dc8c40/lib/crawly/fetchers/splash.exI tend to use Playwright (or Puppeteer if I only care about Chromium) for browser automation and testing, so it'd be cool to be able to use some of it's functionality from
crawly
.The only thing I'm unsure of is whether or not Playwright exposes a requests page/API like Splash does:
I might end up picking this up, but I figured I'd create an issue beforehand. 😄