USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

Added instructions to run Sparkler crawl with a seed url file #161

Closed prenastro closed 6 years ago

prenastro commented 6 years ago

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

Is this related to an already existing issue on sparkler?
If so, mention that issue by referencing its number here.

Will it close an existing issue?
Say 'Closes #IssueNum' here.

How was this patch tested?

We are particularly interested in unit tests, integration tests, manual tests you did to ensure that the patch works as expected, so briefly describe them.

Please review https://github.com/USCDataScience/sparkler/blob/master/.github/CONTRIBUTING.md before opening a pull request.

chrismattmann commented 6 years ago

thanks @prenastro !