aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.58k stars 4.13k forks source link

symlinks are evaluated before exclude\include - "File does not exist" #7072

Open gidonk-pr opened 2 years ago

gidonk-pr commented 2 years ago

Describe the bug

aws s3 sync . s3:\\my_bucket --exclude "*" --include "package*"

if there is a symlink in the source directory that is linked to a missing file - the command will throw a "File does not exist" warning, even though the symlink name does not match the filters applied to the command (i.e does not start with "package")

Expected Behavior

ignore non matching symlinks names, and suppress related warnings \ errors

Current Behavior

symlinks that are missing but do not match command filters still throw warnings

Reproduction Steps

  1. create a symlink to a non existing file
  2. sync parent directory to an s3 bucket while excluding the symlink from step 1

Possible Solution

No response

Additional Information/Context

No response

CLI version used

2.5.2

Environment details (OS name and version, etc.)

Linux/5.10.104-linuxkit (docker image: node:16.14.0-bullseye-slim)

tim-finnigan commented 2 years ago

Hi @gidonk-pr thanks for bringing this to our attention. I could reproduce the behavior you described. I think this is somewhat of an edge case though. You could use the --quiet flag as documented here if you wanted to suppress the output. Or you could use the --no-follow-symlinks parameter if you want to avoid following symlinks altogether. But there are several issues related to improving the s3 sync functionality that this may overlap with to some extent.

keystrike commented 1 year ago

The same happens when symlinks point to special files, e.g. warning: Skipping file /etc/systemd/system/reboot.service. File is character special device, block special device, FIFO, or socket.

This behavior makes it difficult to sync directories like /etc/, especially when running the command from a bash script. The bash script interprets these warnings as errors, which gives the impression that the backup process failed.

It would be helpful if aws s3 sync could either:

  1. Not throw warnings for symlinks to non-existing or special files, especially when those files do not match any filters applied to the command.

  2. Provide an option to exclude these files from the sync process, so that they do not cause warnings.

This would make the output cleaner and easier to interpret, especially when running aws s3 sync from a script.

arthur-guru commented 1 year ago

I have a similar related issue but am using rsync to a target folder that I then apply aws s3 sync on using filters.

The default behaviour of rsync is to use temporary dot files for the individual files when they are being transferred which are then removed (rather renamed) as the individual rsync file transfers complete.

If doing an aws s3 sync on a target rsync folder while filtering out of these rsync dot files, should an rsync be currently in operation and a dot file is deleted then aws s3 sync throws a warning, and exits after completing the sync with error code 2.

It appears to me (though not confirmed) that when aws s3 sync is triggered it reads the entire source folder files into memory without applying filters. As it iterates through its list to perform the second phase of syncing the files to its target, if a source file has since been deleted (or is a broken symlink) then aws s3 sync does not appear to check it against the filters you provided it so you get a warning and an exit error code of 2. I guess the rationale for doing it this way is to quickly get a snapshot of all the current files in the source folder before they change as filtering at this stage can be very slow especially when using complex filters (and the filters are not regex but something bespoke).

Fortunately, the only impact from this issue is just a warning and an annoying exit error code of 2. The exit error code of 2 (file not found) is a pain considering I have already asked aws s3 sync for the file to be excluded from the transfer - maybe this exit code handling can be improved upon in a future release.

Alternatively for rsync there are options for handling the dot files in another folder but that approach also carries baggage with it.