brave / pagegraph-crawl

Gather pagegraph data from all over the internet
Mozilla Public License 2.0
17 stars 7 forks source link

Porting some features from Browsertrix Crawler or integrating it? #89

Open L3thal14 opened 3 months ago

L3thal14 commented 3 months ago

Recently came across the Browsertrix Crawler project which seems to be using Brave Browser for crawls. Some of its features include Support for custom browser behaviors including autoscroll, video autoplay, and site-specific behaviors, Quality Assurance (QA) crawling, screenshots and screencasting.

Though Pagegraph crashes on injecting puppeteer scripts, I believe some of these other features can be useful while running Pagegraph.

pes10k commented 2 months ago

Sure I’d love to hear more! What do you have in mind?

fwiw I’m working on making puppeteer scripts work too. No ETA but working on it