USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Add fetcher-default as a plugin #181

Open balashashanka opened 4 years ago

balashashanka commented 4 years ago

Currently the fetcher default is in package edu.usc.irds.sparkler.util, this issue is to move this as a plugin.

balashashanka commented 4 years ago

Started working on this will be creating a pull request soon.

thammegowda commented 4 years ago

Thanks. we kept the default implementation in the sparkler-app itself for a few simplifications.

FYI, there are three fetcher plugins already if you want to test plugin loading (without having to develop a new one)https://github.com/USCDataScience/sparkler/tree/master/sparkler-plugins

That being said, all improvements are welcomed. just trying to make it easy for you.

balashashanka commented 4 years ago

Hi I saw this is as one of the the TODO's in the code so thought it would be a good start. But i could work on other things if this plugin is not in priority.

thammegowda commented 4 years ago

You were facing an issue regarding deployment of plugins on sparkler cluster, right?
I suggest resolving that as a start.