USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

Solving issues #63 and #65 #74

Closed karanjeets closed 7 years ago

karanjeets commented 7 years ago

Added a read time out to solve #63

Added a URL validator from Apache Commons to solve #65

karanjeets commented 7 years ago

@thammegowda Review and merge.

thammegowda commented 7 years ago

@karanjeets reviewing... It works. You nailed it ;-) Good job 👍

In addition, I am writing a test case to ensure the behavior during timeouts. I will merge it along with the tests.

thammegowda commented 7 years ago

@karanjeets Done. 👍