Writing Data to Elasticsearch Storage Engine

USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Apache License 2.0

412 stars 143 forks source link

Task Description

This is a task that is currently being worked on in order to provide Elasticsearch as a backend storage engine option for Sparkler. This builds upon the Factory Pattern outlined in Issue 218 where we abstract out storage engine-specific implementation.

To achieve the final goal of being able to write Sparkler data into the Elasticsearch storage engine, the team envisions that we'll be following these steps:

Make sure the Elasticsearch storage engine is set up appropriately and ready to accept data
Write simple data to Elasticsearch a. Perhaps a simple visualization to prove functionality
Reorganize Sparkler data into a format conducive to Elasticsearch indexing
Write data into Elasticsearch
Visualize data in Elasticsearch (this will likely be brought up in a future issue)

This is a WIP and updates will be posted here as we make progress.

USCDataScience / sparkler

Writing Data to Elasticsearch Storage Engine #224

Task Description