cloudera-labs / hms-mirror

"hms-mirror" is a utility used to bridge the gap between two clusters and migrate hive metadata.
Apache License 2.0
13 stars 8 forks source link

EXPORT_IMPORT data duplication with subsequent runs #97

Closed dstreev closed 7 months ago

dstreev commented 9 months ago

The default behavior of the IMPORT process doesn't DROP existing data. So additional runs will append to current datasets.

If you're using this process to OVERWRITE an existing table, you may not get the results you'd expect.

dstreev commented 7 months ago

Further research into this shows that this is a normal function of the EXPORT_IMPORT hive process. Precautions should be made to 'reload' the data. hms-mirror is primarily a migration / one-time use tool and doesn't review existing data for these conditions.