gbif / stackable

GBIF Stackable Infrastructure
Apache License 2.0
4 stars 0 forks source link

Propose an initial Yunikorn architecture #22

Closed fmendezh closed 4 months ago

fmendezh commented 8 months ago

We should plan for sensible capacity architecture on Yunikorn that takes into consideration the following components:

  1. Ingestions jobs that run pure Java application (small) and those that required Spark (large) using different demands of resources.
  2. Occurrence downloads: small (multi-threaded Java app s using Elasticsearch) and big Downloads (using dynamic settings depending on the amount of data to process)
  3. Batch and scheduled tasks: maps, table builds, gridded datasets, grscicoll cache, analytics, etc.
  4. Infrastructure elements: HDFS, HBase, Trino, Airflow, Zookeeper, Spark, Vector Aggregator, Hive Metastore
zaultooz commented 4 months ago

The initial queue configuration can be found here:

https://github.com/gbif/gbif-helmfile-configuration/blob/main/config/yunikorn/values.yaml#L122