ExpediaGroup / circus-train

Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
Apache License 2.0
86 stars 15 forks source link

Added some comments in the readme about 'dfs.replication' #133

Closed patduin closed 5 years ago

patduin commented 5 years ago

I've investigated #132 and since CT can run pushing data from a source cluster or pulling data from a target cluster it's very very hard to determine if whatever value is taken from the the Hadoop configuration is correct or should be overridden. I'm afraid this is one that a user needs to be aware of themselves. I've added a comment to warn for this and a clear example how to override the replication factor in Circus Train.

coveralls commented 5 years ago

Coverage Status

Coverage remained the same at 74.927% when pulling 0a3624e69b53a8ae10f07e4ea98dd4aff2b8fa71 on issue-132 into 485d9f3502785f0175a0b9f4594ab72a0fda3511 on master.