USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Define interfaces for the plugins #20

Closed thammegowda closed 4 years ago

thammegowda commented 8 years ago

PS this is incomplete list

sujen1412 commented 7 years ago

@thammegowda, where can I find the code for the completed issues, maybe I could use that as example before starting development of new plugin ? Is there a commit linked to the issue number ?

thammegowda commented 7 years ago

We dont have a proper documentation at the moment (Adding to my todo list to work on it, thanks for reporting).

Meanwhile: here is how I added a new plugin: https://github.com/USCDataScience/sparkler/pull/100 If you need to define a plugin, the interfaces+baseclasses here could be examples: https://github.com/USCDataScience/sparkler/tree/master/sparkler-api/src/main/java/edu/usc/irds/sparkler

TODO: update the wiki with proper docs.

micheladennis commented 6 years ago

Quick Quesiton, What is the parser extension point?

thammegowda commented 6 years ago

@micheladennis Sorry, there was no extension point added to Parser at the moment. This video might help to add a new plugin https://www.youtube.com/watch?v=Ib8OwmoRj-Q