USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

SPARKLER-105: Define a score plugin interface (#105) #133

Closed giuseppetotaro closed 6 years ago

giuseppetotaro commented 6 years ago

This provides a scorer interface to Sparkler (#105) that is basically a contract for each scoring plugin to be used within Sparkler.

I performed testing on my own laptop.

thammegowda commented 6 years ago

Good work @giuseppetotaro Thanks for taking time to work on this. I made a few minor suggestions - otherwise its great. Let me know if you have questions about my suggestions or need any help to make those changes.

thammegowda commented 6 years ago

Thanks @giuseppetotaro I will review this and merge it this weekend

chrismattmann commented 6 years ago

+1 to commit from me great work @giuseppetotaro @thammegowda @sujen1412

chrismattmann commented 6 years ago

team what is the status of this PR? @giuseppetotaro @thammegowda @sujen1412 ? ready to commit?

thammegowda commented 6 years ago

@chrismattmann The bug mentioned in https://github.com/USCDataScience/sparkler/pull/133#discussion_r147548360 should be addressed. It breaks the crawl workflow.