As a user of CTI'd like to change CT to use the target cluster replication factor by defaultSo I don't silently overwrite it
CT uses the hadoop configuration (core-site.xml, etc..) on the cluster it runs on to configure the M/R job used for copying the data. If you push data to a different cluster that means you might override the dfs.replication factor for that data using the source cluster setting and not the target cluster setting.
We should change that behaviour and make sure we don't override that setting unless it is explicitly set in the CT yml configuration copier-options section.
It is particularly tricky when you run in EMR (dfs.replication=1 by default) and replicate to an on-premise HDFS cluster which usually has dfs.replication=3.
Acceptance Criteria:
Target cluster dfs.replication is not overridden unless explicitly configured in CT configuration.
As a user of CT I'd like to change CT to use the target cluster replication factor by default So I don't silently overwrite it
CT uses the hadoop configuration (core-site.xml, etc..) on the cluster it runs on to configure the M/R job used for copying the data. If you push data to a different cluster that means you might override the
dfs.replication
factor for that data using the source cluster setting and not the target cluster setting. We should change that behaviour and make sure we don't override that setting unless it is explicitly set in the CT yml configuration copier-options section.It is particularly tricky when you run in EMR (dfs.replication=1 by default) and replicate to an on-premise HDFS cluster which usually has dfs.replication=3.
Acceptance Criteria: