USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
412 stars 143 forks source link

Debugging Elasticsearch Connection #229

Open Kefaun2601 opened 3 years ago

Kefaun2601 commented 3 years ago

Task Description

Most of the Elasticsearch implementation has already been written. There are still two major problems that need to be resolved:

  1. ElasticsearchResultIterator needs to implement deserialize(). We are having an issue with creating an instance of a generic type.
  2. Debug data persistence to make sure that the data in Elasticsearch is being updated properly.

Updates will be posted as progress is made.

Related PR

225