USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

Get pages Source code #151

Closed remibacha closed 6 years ago

remibacha commented 6 years ago

Hello, can you please add a way to get the source code of all crawled pages? (a new column with all the raw source code?) Thank you!