Open danil-smirnov opened 5 years ago
Could you ellaborate on the exact scenario in which you would want to use --exact-timestamps
for uploads? I believe making it only applicable to downloads was done on purpose because we we do not have any control of the timestamp of the object stored in S3 (the timestamp when uploaded is used). So if we you uploaded a file to s3 with s3 sync
and ran s3 sync
again using --exact-timestamps
, the CLI will reupload the file to S3 because the timestamp in S3 will be different than the local file.
Hi @kyleknap!
--exact-timestamps
flag sounds to me like "overwrite even if same sizes" and it seems working this way - effectively.
I assume the flag is quite important for avoiding cases like listed in this issue.
Sample scenario: we have processes of writing files to local and syncing them to s3, separated in time (e.g. some async CI/CD pipeline). So the first process writes a file to local folder1/, then the second process writes the same-sized file but with different content to folder2/. The async process sync folder1/ to S3 folder/. After that another async process tries to sync folder2/ to S3 folder/. In this scenario the second sync won't overwrite the file as folder2 is older than folder/.
In this case using --exact-timestamps
flag would resolve the issue.
Also, it seems really counterintuitive to me that --exact-timestamps
quietly ignored when copy from local to s3 without any warning. If this flag is firmly considered for downloads only, I would like to see a warning in case it used for upload, to avoid users confusing.
Thank you for reading
I completely agree with @danil-smirnov that the AWS cli should not quietly ignore --exact-timestamps
when the target is an S3 bucket.
My preference would be for it to fail with a usage error. Second best would be to at least print a warning.
There are probably lots of people (like me) using this flag, thinking it's helping to solve their sync-to-s3 issues, when in fact it's only giving them a false sense of security.
This is one of the most frustrating uses of sync i've come across, with --exact-timestamp only applying to downloads to local.
I have a version file which I CAN'T update with sync because the original file and the new file are the same size, but have different contents.
I'd like to rollback to a previous version for QA purposes and i'm unable to do this, because the newer file takes priority if the sizes are the same.
The effect of this is that trying to move between version 1.0.4 to 1.0.3 in my case can't be done as the file which is used to determine the version is the same size, has different content but wont be replaced because the previous version is 'older'.
my use case is that I want to sync from my artifacts in my dev account to my local disk, make a couple changes for tracking versions history etc, then upload the local disk to the prod account. but when the prod account has a file that's newer than what I want to push from my dev account, my aws s3 sync
won't cut it
This is amazing. There's no way to run "aws sync" to s3 in a way that guarantees that the local files match the remote ones? That makes this command kind of useless. Is there any better way to handle this other than deleting all the files and re-uploading?
I have a situation where a target s3 bucket is being synced from 2 different sources. If "source a" uploads a newer existing file to the target, and "source b" has an older file but with a different size, aws sync will overwrite with the older file. How do i avoid this behavior? I want to ONLY sync using newer modified dates.
Agree that --exact-timestamps
would be useful in s3 -> s3 or local -> s3 scenarios and I'm not sure I understand what prevents it being included?
The initial response indicates that it would create unnecessary uploads in the case of running multiple syncs with the same source and destination. I understand that limitation but the functionality would still be useful for other use cases. Even with that limitation it's an improvement on the current approaches you'd have to use which all involve running at least 2 commands and uploading everything regardless of whether the timestamps match.
If there's a concern around altering the behaviour of this then why not instead put in a different flag? It seems strange that there's no way to run sync that ensures the content of the destination is what's in the source.
In terms of use case mine is using one bucket to hold versioned builds of a static website and another bucket to host the website as pointed out above currently rollback is impossible with sync and even updating to a later version can break depending on when the versions are first put in S3 and when they are first moved to the hosting bucket.
As a side note for anyone who comes across this issue I think first copying with --recursive
and then syncing with --delete
to cleanup is a better workaround than deleting first.
How are you guys circumventing this? Do you first remove all files from S3 and then sync? I'm getting issues whenever my CI/CD pipeline runs a revert operation.
We switched to using https://rclone.org/ for sync operations. It behaves as expected.
I'm also looking for some way to guarantee that the files I'm syncing from local will be guaranteed to all be copied to s3 regardless of file sizes or timestamps. I have the folder contents that I'd like to sync and would really like to just have a simple way to push them up
Agree that
--exact-timestamps
would be useful in s3 -> s3 or local -> s3 scenarios and I'm not sure I understand what prevents it being included?The initial response indicates that it would create unnecessary uploads in the case of running multiple syncs with the same source and destination. I understand that limitation but the functionality would still be useful for other use cases. Even with that limitation it's an improvement on the current approaches you'd have to use which all involve running at least 2 commands and uploading everything regardless of whether the timestamps match.
If there's a concern around altering the behaviour of this then why not instead put in a different flag? It seems strange that there's no way to run sync that ensures the content of the destination is what's in the source.
In terms of use case mine is using one bucket to hold versioned builds of a static website and another bucket to host the website as pointed out above currently rollback is impossible with sync and even updating to a later version can break depending on when the versions are first put in S3 and when they are first moved to the hosting bucket.
As a side note for anyone who comes across this issue I think first copying with
--recursive
and then syncing with--delete
to cleanup is a better workaround than deleting first.
I have the exact same setup as you. Any chance you could elaborate on your statement about how doing a copy first helps out this situation?
I have the exact same setup as you. Any chance you could elaborate on your statement about how doing a copy first helps out this situation?
@WonderPanda - If I remember correctly the issue is that existing files are not always replaced when using sync. My workaround is this:
aws s3 cp --recursive s3://$SOURCE_BUCKET s3://$DESTINATION_BUCKET &&
aws s3 sync --delete s3://$SOURCE_BUCKET s3://$DESTINATION_BUCKET
The copy command ensures every file in source is now in destination. The sync command then removes any files in destination that aren't in source.
FWIW we've been using that since my original comment without issue.
There is error-prone situation described in AWS S3 Sync Issues when same-sized files are failed to be updated.
To help users to avoid issues,
--exact-timestamps
flag has been added to s3 sync command with this PR.However, the implementation is enabled for download operation only, i.e. when files are being copied from s3 to local.
Hence is case of uploading files from local to s3 this flag does not work (ignored). I think this is rather counterintuitive and should be changed to make the flag respected for all the commands.