Open quiver opened 9 years ago
We can take this into consideration. Before that, your workaround will be use shell script to achieve similar effect.
This would be cool, specially if the cli can handle those cp in parallel. For example at the moment I want to copy 13K files from different S3 locations. They are all in the same bucket but they are not in the same folder so I have to write one 'aws s3 cp' command for each file and it takes a lot of time to run.
The commands that I'm running are something like this:
aws s3 cp s3://example-bucket/0-200M/A.json.gz s3://example-bucket/output-dir/
aws s3 cp s3://example-bucket/1000M-1500M/B.json.gz s3://example-bucket/output-dir/
aws s3 cp s3://example-bucket/another-dir/C.json.gz s3://example-bucket/output-dir/
aws s3 cp s3://example-bucket/0-200M/D.json.gz s3://example-bucket/output-dir/
aws s3 cp s3://example-bucket/1000M-1500M/E.json.gz s3://example-bucket/output-dir/
aws s3 cp s3://example-bucket/another-dir/F.json.gz s3://example-bucket/output-dir/
aws s3 cp s3://example-bucket/another-dir/H.json.gz s3://example-bucket/output-dir/
... 13K lines more with the same command, just changing the input s3 file..
This approach takes a lot of time. Is there any workaround to this kind of issues? If not, I think the tool should support a batch-cp where you can specify a list (or maybe a file) with all the files that you want to copy.
I agree, I am doing something identical to @ejoncas, and while this isn't bad, The timout in between each cp task makes this a several hour process.
Any updates on this?
Besides scripting a loop for aws s3 cp
, I've used aws s3 sync
to accomplish this
aws s3 sync --exclude=* --include=a* s3://bucket/
you can provide multiple --excludes
and --includes
, Above I'm excluding everything then including what I want
I'd say without supporting multiple files copy, the CLI is seriously crippled. There are literally no justifiable reasons of not supporting this, merely due to the laziness of AWS engineers, and bad project management of AWS CLI. No excuses! Shame on you AWS CLI folks.
Fix @yyolk 's command issue: aws: error: too few arguments
Suppose you need to sync the current folder to s3 bucket, add .
as source.
aws s3 sync --exclude=* --include=a* . s3://bucket/
Good Morning!
We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.
This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.
As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.
We’ve imported existing feature requests from GitHub - Search for this issue there!
And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.
GitHub will remain the channel for reporting bugs.
Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface
-The AWS SDKs & Tools Team
This entry can specifically be found on UserVoice at: https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168382-why-aws-s3-cp-does-not-accept-multiple-sources
This message was created automatically by mail delivery software.
A message that you sent could not be delivered to one or more of its recipients. This is a temporary error. The following address(es) deferred:
mkdirenv@gmail.com Domain salmanwaheed.info has exceeded the max emails per hour (150/150 (100%)) allowed. Message will be reattempted later
------- This is a copy of the message, including all the headers. ------
------ The body of the message is 6173 characters long; only the first
------ 5000 or so are included here.
Received: from github-smtp2-ext2.iad.github.net ([192.30.252.193]:53933 helo=github-smtp2b-ext-cp1-prd.iad.github.net)
by box1177.bluehost.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
(Exim 4.89_1)
(envelope-from noreply@github.com)
id 1ej0Or-001aTW-Fw
for hello@salmanwaheed.info; Tue, 06 Feb 2018 03:22:54 -0700
Date: Tue, 06 Feb 2018 02:22:34 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com;
s=pf2014; t=1517912554;
bh=1Kwm4VO+JPgtYylbgo7s3UaRgKxjnczSbJfF6ZeTkvo=;
h=From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID:
List-Archive:List-Post:List-Unsubscribe:From;
b=LkJKOOqN7Og6jAz60fFc+9T1hwFyxuvompSQ+YC/lRvNnNEl/Qfwk5zcqrecVAau0
b7Tn4g8n8sHzPuJqf8ALYbSVZScPSYi+QplKjGIGW9SW8+P7+lWX8ZdTaTI9Z8B8CY
/lPIB+B8P+D2KZiVIczniq+ayUGHoYL0ud9dOB8M=
From: Andre Sayre notifications@github.com
Reply-To: aws/aws-cli reply@reply.github.com
To: aws/aws-cli aws-cli@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Message-ID: aws/aws-cli/issues/1542/363377746@github.com
In-Reply-To: aws/aws-cli/issues/1542@github.com
References: aws/aws-cli/issues/1542@github.com
Subject: Re: [aws/aws-cli] Why aws s3 cp does not accept multiple sources?
(#1542)
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary="--==_mimepart_5a7981ea4fab3_2745c3ff519788f2c1872dc";
charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: ASayre
X-GitHub-Recipient: salmanwaheed
X-GitHub-Reason: subscribed
List-ID: aws/aws-cli
Content preview: Good Morning! We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI. [...]
Content analysis details: (-1.1 points, 5.0 required)
pts rule name description
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: uservoice.com] -0.5 SPF_PASS SPF: sender matches SPF record 0.0 HTML_MESSAGE BODY: HTML included in message -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.5 AWL AWL: Adjusted score from AWL reputation of From: address X-Spam-Flag: NO
----==_mimepart_5a7981ea4fab3_2745c3ff519788f2c1872dc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Good Morning!
We're closing this issue here on GitHub, as part of our migration to Use= rVoice for feature requests involving the AWS CLI.
This will let us get the most important features to you, by making it eas= ier to search for and show support for the features you care the most abo= ut, without diluting the conversation with bug reports.
As a quick UserVoice primer (if not already familiar): after an idea is p= osted, people can vote on the ideas, and the product team will be respond= ing directly to the most popular suggestions.
We=E2=80=99ve imported existing feature requests from GitHub - Search for= this issue there!
And don't worry, this issue will still exist on GitHub for posterity's sa= ke. As it=E2=80=99s a text-only import of the original post into UserVoi= ce, we=E2=80=99ll still be keeping in mind the comments and discussion th= at already exist here on the GitHub issue.
GitHub will remain the channel for reporting bugs. =
Once again, this issue can now be found by searching for the title on: ht= tps://aws.uservoice.com/forums/598381-aws-command-line-interface =
-The AWS SDKs & Tools Team
-- =
You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/aws/aws-cli/issues/1542#issuecomment-363377746=
----==_mimepart_5a7981ea4fab3_2745c3ff519788f2c1872dc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Good Morning!
We're closing this issue here on GitHub, as part of our migration to <= a href=3D"https://aws.uservoice.com/forums/598381-aws-command-line-interf= ace" rel=3D"nofollow">UserVoice for feature requests involving the AW= S CLI.
This will let us get the most important features to you, by making it = easier to search for and show support for the features you care the most = about, without diluting the conversation with bug reports.
As a quick UserVoice primer (if not already familiar): after an idea i= s posted, people can vote on the ideas, and the product team will be resp= onding directly to the most popular suggestions.
We=E2=80=99ve imported existing feature requests from GitHub - Search = for this issue there!
And don't worry, this issue will still exist on GitHub for posterity's= sake. As it=E2=80=99s a text-only import of the original post into User= Voice, we=E2=80=99ll still be keeping in mind the comments and discussion= that already exist here on the GitHub issue.
GitHub will remain the channel for reporting bugs.
Once again, this issue can now be found by searching for the title on:= https://aws.uservoice.com/forums/598381-aws-comma= nd-line-interface
-The AWS SDKs & Tools Team
<p style=3D"font-size:small;-webkit-text-size-adjust:none;color:#666;">&m=
dash;
You are receiving this because you are subscribed to this thre=
ad.
Reply to this email directly, <a href=3D"https://github.com/aws/=
aws-cli/issues/1542#issuecomment-363377746">view it on GitHub, or <a =
href=3D"https://github.com/notifications/unsubscribe-auth/AO8bOC6j1pZh0pv=
Uvuxl4xnHDXUB6swQks5tSCfqgaJpZM4GIAdP">mute the thread.<img alt=3D"" =
height=3D"1" src=3D"https://github.com/notifications/beacon/AO8bOJ_1J-Cks=
R1K6ngaKX0L68hrmAxBks5tSCfqgaJpZM4GIAdP.gif" width=3D"1" />
<script type=3D"application/json" data-scope=3D"inboxmarkup">{"api_versio= n":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name"= :"GitHub"},"entity":{"external_key":"github/aws/aws-cli","title":"aws/aws= -cli","subtitle":"GitHub repository","main_image_url":"https://cloud.gith= ubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c= 7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/= 143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name= ":"Open in GitHub","url":"https://github.com/aws/aws-cli"}},"updates":{"s= nippets":[{"icon":"PERSON","message":"@ASayre in #1542: Good Morning!\r\n= \r\nWe're closing this issue here on GitHub, as part of our migration to = UserVoice for feature requests involving the AWS CLI.\r\n\r\nThis will let u= s get the most important features
This message was created automatically by mail delivery software.
A message that you sent could not be delivered to one or more of its recipients. This is a temporary error. The following address(es) deferred:
mkdirenv@gmail.com Domain salmanwaheed.info has exceeded the max emails per hour (153/150 (102%)) allowed. Message will be reattempted later
------- This is a copy of the message, including all the headers. ------
Received: from github-smtp2-ext1.iad.github.net ([192.30.252.192]:44085 helo=github-smtp2a-ext-cp1-prd.iad.github.net)
by box1177.bluehost.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
(Exim 4.89_1)
(envelope-from noreply@github.com)
id 1ej0P0-001aTi-Kr
for hello@salmanwaheed.info; Tue, 06 Feb 2018 03:23:03 -0700
Date: Tue, 06 Feb 2018 02:22:34 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com;
s=pf2014; t=1517912554;
bh=pWP4xDtiKXHQSy5juE+AbxyIGefKShljpTc8iQaYHts=;
h=From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID:
List-Archive:List-Post:List-Unsubscribe:From;
b=gFrcHPWgRwBeCD3VFHf40K6h5eeqoeVFvoGqqCiGWq9KO6NZU/7ccJLTnF9noHblG
oRi0mUqq4K39TBXSbxnXc+qomhNzBT8bmwGZbILwqg7FwLKWs5yNQ7ob9z1h9+PzGm
zp2wmaWQ6JexvP3Zhxzdp+xSEzYsLDh8nBkuQNDU=
From: Andre Sayre notifications@github.com
Reply-To: aws/aws-cli reply@reply.github.com
To: aws/aws-cli aws-cli@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Message-ID: aws/aws-cli/issue/1542/issue_event/1459788389@github.com
In-Reply-To: aws/aws-cli/issues/1542@github.com
References: aws/aws-cli/issues/1542@github.com
Subject: Re: [aws/aws-cli] Why aws s3 cp does not accept multiple sources?
(#1542)
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary="--==_mimepart_5a7981eac5c4d_1ffd2ad9784d8ed4406080";
charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: ASayre
X-GitHub-Recipient: salmanwaheed
X-GitHub-Reason: subscribed
List-ID: aws/aws-cli
Content preview: Closed #1542. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/aws/aws-cli/issues/1542#event-1459788389 Closed #1542. [...]
Content analysis details: (0.5 points, 5.0 required)
pts rule name description
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: github.com] -0.5 SPF_PASS SPF: sender matches SPF record 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000 bytes of words -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 2.5 DCC_CHECK No description available. -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -2.1 AWL AWL: Adjusted score from AWL reputation of From: address X-Spam-Flag: NO
----==_mimepart_5a7981eac5c4d_1ffd2ad9784d8ed4406080 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit
Closed #1542.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/aws/aws-cli/issues/1542#event-1459788389 ----==_mimepart_5a7981eac5c4d_1ffd2ad9784d8ed4406080 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit
Closed #1542.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
----==_mimepart_5a7981eac5c4d_1ffd2ad9784d8ed4406080--
Based on community feedback, we have decided to return feature requests to GitHub issues.
Any update on this? Would still be great.
I think sync and cp are like sword and needle. They have different use-cases. @ejoncas Your case was similar to mine. Its a use case of copy and not sync. Store all the paths of individual files separated by a new line in a separate file called file_with_all_paths.txt
Something like this:
s3://example-bucket/0-200M/A.json.gz s3://example-bucket/1000M-1500M/B.json.gz s3://example-bucket/another-dir/C.json.gz .... ... ..
A bash loop can read through that file one by one and run the cp command
for f in $(cat ~/path_to_the_file/file_with_all_paths.txt); do echo "Now moving file $f"; aws s3 cp $f s3://example-bucket/output-dir/; done
Although I am also a beginner, I did write a blog on how I accomplished it. Check it out here: http://www.onceaday.today/subjects/15/posts/152. If it helps someone, great!
--include and --exclude are cute and useful, but only when there's a discernable pattern to the file names. If it's just a random-looking list of names, they're useless.
What would help tremendously would be the ability to read a list of source files from a file. Or just accept multiple source files as arguments - but reading that whole list from a file would be much more powerful.
aws s3 cp --source-files long_list.txt s3://bucket_name/
This needs to work with source files that are either local or in a bucket.
The CLI would then absolutely need to do batch copies, if the API allows it.
My suggestion --
aws s3 cp --source-files long_list.txt s3://bucket_name/
aws s3 cp "file1.xls,file2.jpg,file3.txt,file4.html" s3://bucket_name/
Has there been an update enabling to do this yet?
This would be cool, specially if the cli can handle those cp in parallel. For example at the moment I want to copy 13K files from different S3 locations. They are all in the same bucket but they are not in the same folder so I have to write one 'aws s3 cp' command for each file and it takes a lot of time to run.
The commands that I'm running are something like this:
aws s3 cp s3://example-bucket/0-200M/A.json.gz s3://example-bucket/output-dir/ aws s3 cp s3://example-bucket/1000M-1500M/B.json.gz s3://example-bucket/output-dir/ aws s3 cp s3://example-bucket/another-dir/C.json.gz s3://example-bucket/output-dir/ aws s3 cp s3://example-bucket/0-200M/D.json.gz s3://example-bucket/output-dir/ aws s3 cp s3://example-bucket/1000M-1500M/E.json.gz s3://example-bucket/output-dir/ aws s3 cp s3://example-bucket/another-dir/F.json.gz s3://example-bucket/output-dir/ aws s3 cp s3://example-bucket/another-dir/H.json.gz s3://example-bucket/output-dir/ ... 13K lines more with the same command, just changing the input s3 file..
This approach takes a lot of time. Is there any workaround to this kind of issues? If not, I think the tool should support a batch-cp where you can specify a list (or maybe a file) with all the files that you want to copy.
Thanks a lot @ejoncas for your answer which help me solved my problem ! <3
I'd also like to see something like a --source-files
parameter but found this one line bash loop to be a useful workaround for now:
for file in $(cat filenames.txt); do {aws s3 cp $file s3://bucket-name } done
(very similar to what @sp2410 mentioned earlier)
This is a feature request.
It would be great if
s3 cp
command accepts multiple sources just like bashcp
command. For example