tldr; Split traffic based on called HMS API method, e.g. getTable will go to a readOnly HMS and alterTable will go to readWrite HMS
The problem addressed here is running WD at scale. Generally our company deploys Waggle Dance as part of an Apiary Data lake: https://github.com/ExpediaGroup/apiary-data-lake.
This involves deploying ReadOnly and ReadWrite Metastores (HMS).
For the primary (local) metastore waggle dance is configured to the ReadWrite instance which connects to a ReadWrite RDS backend. This means all traffic both read and writes end up on our ReadWrite RDS instance. This PR tries to split that traffic and move read traffic to ReadOnly instance.
The benefit would be:
easily scale ReadOnly RDS instances to handle more load
Automatically redirect the traffic without user configuration changes. Lots of ETL do read and writes as part of their workflow it has proven difficult for users to fully switch to a ReadOnly instance only this PR makes the decision for them.
tldr; Split traffic based on called HMS API method, e.g.
getTable
will go to a readOnly HMS andalterTable
will go to readWrite HMSThe problem addressed here is running WD at scale. Generally our company deploys Waggle Dance as part of an Apiary Data lake: https://github.com/ExpediaGroup/apiary-data-lake. This involves deploying ReadOnly and ReadWrite Metastores (HMS). For the primary (local) metastore waggle dance is configured to the ReadWrite instance which connects to a ReadWrite RDS backend. This means all traffic both read and writes end up on our ReadWrite RDS instance. This PR tries to split that traffic and move read traffic to ReadOnly instance. The benefit would be: