aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.46k stars 4.1k forks source link

aws s3 sync does not synchronize s3 folder structure locally #912

Open tatobi opened 10 years ago

tatobi commented 10 years ago

The aws s3 sync does not fully synchronize the S3 folder structure locally even if I use it with --delete or --recursive arguments:

aws --version aws-cli/1.4.3 Python/2.7.6 Linux/3.13.0-35-generic

$ aws s3 ls s3://s3.testbucket $ aws s3 ls s3://s3.testbucket/ $ mkdir s3.testfolder $ mkdir s3.testfolder/test1 $ aws s3 sync ./s3.testfolder s3://s3.testbucket/ $ aws s3 ls s3://s3.testbucket/ $ touch s3.testfolder/test1/1 $ aws s3 sync ./s3.testfolder/ s3://s3.testbucket/ upload: s3.testfolder/test1/1 to s3://s3.testbucket/test1/1 $ aws s3 sync ./s3.testfolder s3://s3.testbucket/ $ mkdir ./s3.testfolder/test-to-delete $ aws s3 sync s3://s3.testbucket/ ./s3.testfolder/ --delete --recursive $ aws s3 sync s3://s3.testbucket/ ./s3.testfolder/ --delete $ ls -lah ./s3.testfolder/ total 60K drwxrwxr-x 4 tobi tobi 4,0K szept 12 15:24 . drwx------ 71 tobi tobi 44K szept 12 15:22 .. drwxrwxr-x 2 tobi tobi 4,0K szept 12 15:23 test1 drwxrwxr-x 2 tobi tobi 4,0K szept 12 15:24 test-to-delete

$ aws s3 ls s3://s3.testbucket/ PRE test1/

kyleknap commented 10 years ago

This behavior is known. The reason why the sync command behaves this way is that s3 does not physically use directories. There are only buckets and objects. Objects have prefixes that act like directories, but s3 does not designate a specific physical object to be a directory.

Therefore, when the syncing occurs, only files are transferred to s3 because s3 does not have physical directories. So when you try to sync up empty directories, nothing is uploaded because there are no files in them. Once you put items in the directory, then the file (with the prefix representing the directory) will be uploaded.

tatobi commented 10 years ago

Thank you Kyle, it is clear. I know how S3 stores files, but sometimes we need the same directory structure in sevaral places even if there are empty ones or remove from if we do not need anymore. A good example if you have complex directory structure with a lot of contents locally than you synced to S3. After that an automated mechanism sync this structure periodically to several running instances. You keep up-to date (delete) most of the content from S3 then the automatism re-sync to the places where you used before. Unfortunately you will find the original complex directory structure remains forever on sync targets which may cause confusion if you want to check it or your program try to use this empty folders because of you need always the same everywhere. Moreover the people who use it with --delete options maybe used the "rsync" equivalent before on Linux which keeps the folders synced so counts on the same operation. I think it would be not hard to implement a switch or option for aws tool to detect somehow if an S3 object is a file or folder (list, size, etc..) and create/delete them locally or in an S3 bucket (e.g. list(bucket.list("", "/"))?

kyleknap commented 10 years ago

That makes sense. Will look into adding a feature for it.

ururk commented 9 years ago

This would be very useful for our situation as well. If it were added as an option (--sync-empty-directories) people could choose to use it when needed.

xbeta commented 9 years ago

+1 Need this feature very badly

w32-blaster commented 9 years ago

+1. Would like to use it.

danielwinter83 commented 9 years ago

+1

unixmonkey commented 9 years ago

I also was surprised by this behavior, given that it is called "sync". I can work around this in my particular use case, but future users could be spared the pain :)

athurber commented 9 years ago

+1 on being able to sync directory structure! If you delete a folder it only removes the content, but it leaves the folder behind...

zioalex commented 9 years ago

+1. I have the same needs.

benjaminsherwood commented 9 years ago

+1 - surprised that hasn't been implemented yet. Sure, in my case it doesn't matter too much, and I can work around it (or just use placeholder files when creating structures), but it would be a benefit to just have it supported by either s3 sync or s3 cp.

cauboy commented 9 years ago

+1

s3cmd sync does keep the folder structure but therefore it has some issues when granting access while synching so one needs to run another s3cmd setacl --recursive afterwards…

ntmggr commented 9 years ago

+1

ghost commented 9 years ago

+1

asdesilva commented 9 years ago

+1

jamesls commented 9 years ago

Thanks for the feedback everyone. I think the best option I've seen is to add a --sync-empty-directories option. Let's do that.

xbeta commented 9 years ago

@jamesls I'm expecting somewhat like rsync functionalities, but s3 as an object storage is definitely not the same though.

makeittotop commented 9 years ago

+1

mradochonski commented 9 years ago

+1

mradochonski commented 9 years ago

Any timeline for this feature?

swarupdonepudi commented 8 years ago

As a temporary workaround I added an empty .s3keep file to the empty directories and it works for me. This is a hack I usually use to trick git to not treat empty directories as empty ones :)

lcasey001 commented 8 years ago

Will this also allow to "remove/delete" empty directories on S3 ?

arcadas commented 8 years ago

+1

andrefelipe commented 8 years ago

+1

jp30566347 commented 8 years ago

+1

mputilin commented 8 years ago

+1

ixtli commented 8 years ago

+1

asyavuz commented 8 years ago

+1

matteomelani commented 8 years ago

+1

Wizacorn commented 8 years ago

+1

dijeesh commented 8 years ago

+1

ghost commented 8 years ago

+1

xam7247 commented 8 years ago

+1

it-marmalade commented 8 years ago

+1

steven-klein commented 8 years ago

+1

moradai commented 8 years ago

+1

miggy282 commented 8 years ago

+1

ritxos commented 8 years ago

+1

Makes lot of sense during data migrations to s3.

thoralf-gutierrez commented 8 years ago

+1

bdo-eow commented 8 years ago

+1 Just got smashed by this... Arg....

murraybrad13 commented 8 years ago

+1

quater commented 8 years ago

+10 It's possible to work around this with dummy files but it would be cleaner if there would be an option to force an empty prefix to synchronize.

ghost commented 8 years ago

+1. Use case: backing up an svn repository.

More generally: aws s3 sync thing aws s3 sync thing_copy

I expected thing_copy to match thing exactly.

hesu commented 8 years ago

+1

jumping commented 8 years ago

+1

ghost commented 8 years ago

+1

carhensi commented 8 years ago

+1 need to delete empty directories

hierony94 commented 8 years ago

How's the progress of adding this option --sync-empty-directories? any feedback from AWS Team? Thanks.

jimbocoder commented 8 years ago

+1 would be a very useful feature for a very useful tool

SirR4T commented 8 years ago

+1