USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Modified Readme for crawling seed-urls.txt #180

Closed amirhosf closed 4 years ago

amirhosf commented 4 years ago

What changes were proposed in this pull request?

Only the Readme was changed. When injecting the crawler from a list of websites on seed-urls.txt 'bash' command has to be used on mac device terminal due to differences in bash versions. (Please fill in changes proposed in this fix)

Is this related to an already existing issue on sparkler?
If so, mention that issue by referencing its number here.

Will it close an existing issue?
Say 'Closes #IssueNum' here.

How was this patch tested?

We are particularly interested in unit tests, integration tests, manual tests you did to ensure that the patch works as expected, so briefly describe them.

Please review https://github.com/USCDataScience/sparkler/blob/master/.github/CONTRIBUTING.md before opening a pull request.

thammegowda commented 4 years ago

thanks @amirhosf