gbif / stackable

GBIF Stackable Infrastructure
Apache License 2.0
4 stars 0 forks source link

Test: Map build (HDFS, Spark, HBase, ZK) #1

Closed timrobertson100 closed 1 year ago

timrobertson100 commented 1 year ago

This test will evaluate the use of Stackable to run the GBIF map build job.

A custom Spark job will process coordinates found in the GBIF occurrence data into views stored as HFiles on HDFS that are suitable for subsequent bulk loading into HBase. A simple Java-based vector tile server sitting on HBase will serve map views to the browser.

It is envisaged that this test scenario will cover:

The same procedure should be runnable on a laptop.

Once complete we expect to have gained an understanding of the configuration in Stackable and what is required to adapt our code to the later versions of HDFS, Spark, and HBase (Edited to add: we will update our code to run on the latest versions of everything).

fmendezh commented 1 year ago

be aware that Stackable supports Spark 3.2.1 and 3.3.0 only, the code we have works for 2.3.0 https://docs.stackable.tech/spark-k8s/stable/index.html

zaultooz commented 1 year ago

Commit: https://github.com/gbif/stackable/commit/184be0338bb51cf3f204ee1620d05ea6d46a50d5 contains the generic charts used for creating the cluster and deploying: HDFS, zookeeper, Hbase, Spark application and Vectortile server.

Env specific information has been left out.