USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
412 stars 143 forks source link

push chrome fetcher code #173

Closed buggtb closed 5 years ago

buggtb commented 5 years ago

I have a chrome fetcher that can integrate with headless chrome in docker containers for crawling. I should push it!

buggtb commented 5 years ago

It'll need a bit of tidying up, but I've tested it on Browserless(https://hub.docker.com/r/browserless/chrome/) and it worked well for my use. I'll document it on the wiki.