ExpediaGroup / waggle-dance

Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Apache License 2.0
268 stars 75 forks source link

Added extra option to add readOnly thrift HMS uri #308

Closed patduin closed 7 months ago

patduin commented 8 months ago

tldr; Split traffic based on called HMS API method, e.g. getTable will go to a readOnly HMS and alterTable will go to readWrite HMS

The problem addressed here is running WD at scale. Generally our company deploys Waggle Dance as part of an Apiary Data lake: https://github.com/ExpediaGroup/apiary-data-lake. This involves deploying ReadOnly and ReadWrite Metastores (HMS). For the primary (local) metastore waggle dance is configured to the ReadWrite instance which connects to a ReadWrite RDS backend. This means all traffic both read and writes end up on our ReadWrite RDS instance. This PR tries to split that traffic and move read traffic to ReadOnly instance. The benefit would be: