USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

Sparkler 147 : Support for Chaining multiple similar plugins, eg: URLFilter #148

Closed thammegowda closed 6 years ago

thammegowda commented 6 years ago

What changes were proposed in this pull request?

Chaining of extensions when multiple similar extensions are enabled. Plugin service is updated to look for such scenarios, and when feasible, it will make them all work by wrapping in a ExtensionChain. Tested using two UrlFilters, but the interfaces and chaining approach is generalizable to other plugins too.

Will it close an existing issue?
Closes #147

thammegowda commented 6 years ago

enabled urlfilter-regex and urlfilter-samehost by default.