cloudera-labs / hms-mirror

"hms-mirror" is a utility used to bridge the gap between two clusters and migrate hive metadata.
Apache License 2.0
13 stars 8 forks source link

Keep the purge option when syncing schemas #104

Open dvergari opened 6 months ago

dvergari commented 6 months ago

As of now the only way to drop tables on the RIGHT cluster that not exist on the LEFT is to use they --sync option, but using it if we're converting a legacy managed table to an external one it does not set the PURGE option, potentially keeping unwanted data on the RIGHT cluster.

dstreev commented 5 months ago

Adding the purge back in this scenario might cause issues with a schema update, since that process drops and recreates the table. If the purge flag were set, we'd inadvertently remove the data.

What if we built an extra 'post' run file that issued hdfs dfs rm -r -f commands when a RIGHT side schema meets this 'drop' scenario?