USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

ScanDirectory FIXME in Injector.scala #91

Closed sk-s-hub closed 7 years ago

sk-s-hub commented 7 years ago

Let me know if there are any issues, will do the needful. Also, I have mentioned a TODO in the function. Let me know if Regex selection of a file is required.

thammegowda commented 7 years ago

@SHASHANK-PRO-05 Thanks for the PR. Good work👍, your changes looks clean and well documented. I feel regex will be overkill for this job, so .txt is good enough.

I will give it a try and merge this. Lets wait for a day or two to let others have a chance to see and comment on this

karanjeets commented 7 years ago

@SHASHANK-PRO-05 :+1: Works like a charm! I will add a URL validation on top of this to ensure quality.