Closed rajeshsaluja closed 6 years ago
Waggle Dance only interacts with Hive on the metadata level. The metastores in the federation are simply feeding raw data locations to your Hive execution engine. All actual data manipulation is done on the cluster from which you fire your query. No additional movement of data takes place prior to query execution.
What this mean in practice is that the cluster on which you are running the query requires access to the file stores referenced in the table metadata of your source tables, wherever they may be. For large tables this also implies that you'll require adequate bandwidth to pull the underlying table data across to your execution engine.
Our architectures are normally doing this in a single cloud provider, between different clusters and accounts in the same region where such access is plentiful. We have also experimented with federations across regions.
Great . Thanks on the clarification !
We are doing experimentation across regions with one of the region in our own data center and other in AWS.
I've question on Filter, Sorting, Joins, Merge through waggle-dance
E.g.
select * from remotedb.emp where created_date > sysdate - 365
through waggle-dance (emp
table exists in remote hive metastore)Does above statement does filter at remote metastore and then waggle dance get the data or entire table data is pulled before applying filter condition.
Similar question for sorting or joins of tables across metastore. Where does the actual operation is carried out.
Thanks