dstreev / hms-mirror

Copy Hive tables definitions to Compute Cluster, while still using Storage on original cluster
Apache License 2.0
11 stars 7 forks source link

Support intermediate storage option for avro-schema-migration #52

Open laurencedaluz opened 2 years ago

laurencedaluz commented 2 years ago

The --avro-schema-migration flag requires a "linked" cluster so that it can copy the avro schema file to the target cluster. It would be great if this scenario also worked with the intermediate storage flag (instead of a "linked" cluster), as this would simplify the migration of avro tables to public cloud.

dstreev commented 2 years ago

The current problem is that the movement of the AVRO schema file is done through the HCFS client and relies on the host to be configured with the core/hdfs-site.xml files for the cluster. But in this case, you only have access to the side that you're running it on, but not the other side.

dstreev commented 2 years ago

Now, if we used 'common-storage' we could make it work because that's where the data will live too. And the other cluster could just reference the location where it was placed by the source cluster. In this scenario, which is usually the case because of visibility, hms-mirror is run from the LEFT cluster.