USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Make storage engine pluggable #196

Closed buggtb closed 3 years ago

buggtb commented 3 years ago

Should support both Solr and Elastic OOTB, but also thinking about JSON support in Postgres etc for other folk.

lewismc commented 3 years ago

Hi @buggtb OK I only just saw this. We are of course currently working on the StorageProxy and StorageClient implementations. It reminded me of all of the work I've done on Apache Gora over the years but I thought it might be a bit too much overkill at this time. What are your thoughts on Gora? Have you used it?

lewismc commented 3 years ago

Superseded by https://github.com/USCDataScience/sparkler/issues/218