Open hpasumarthi opened 1 year ago
Came up with small script which can be used to convert distcp into aws cli
if [ -f "$1" ]; then
echo "##Working on file : $1"
else
echo "##File in the path $1 does not exist."
exit
fi
echo "##Run set/export AWS_DEFAULT_PROFILE=sso.dev before running commands below"
grep 's3a://' $1 |sed 's/s3a:/s3:/g'| while read line
do
location_right=`echo $line |cut -d '|' -f3| xargs`
echo $line|cut -d '|' -f4|sed 's/<br>/\n/g' | while read location_left
do
tbl_name="${location_left##*/}"
if [[ "$location_left" =~ .*"s3://".* ]]; then
echo "aws s3 sync $location_left $location_right/$tbl_name"
fi
done
done
Running the script will print distcp locations as aws commands
% sh distcp_awscli.sh testdev_airlines_RIGHT_distcp_workbook.md
##Working on file : testdev_airlines_RIGHT_distcp_workbook.md
##Run set/export AWS_DEFAULT_PROFILE=sso.dev before running commands below
aws s3 sync s3://ps-uat2/testdev-iceberg/airlines-iceberg/flights s3://ps-uat22/testdev-iceberg/airlines-iceberg/flights
aws s3 sync s3://ps-uat2/testdev-iceberg/airlines-iceberg/flights_iceberg s3://ps-uat22/testdev-iceberg/airlines-iceberg/flights_iceberg
aws s3 sync s3://ps-uat2/testdev-iceberg/airlines-iceberg/planes s3://ps-uat22/testdev-iceberg/airlines-iceberg/planes
Hello Team, In CDW Public cloud on AWS data is on S3 buckets. Copying data via hadoop cli or distcp is not possible for PC environments because we do not have hadoop clusters. Can we enhance hms-mirror to use aws cli commands to copy data between left and right table locations i.e S3 buckets.
e.g aws s3 cp s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET or
e.g aws s3 sync s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET
https://repost.aws/knowledge-center/move-objects-s3-bucket
Expectation is instead of running distcp commands, hms-mirror will use aws cli to copy data from left to right. Regards, Hemanth