Open drzraf opened 8 years ago
I played around with this, and I too see that the --exclude
behavior is not working well (this possibly has to do with the fact that the root folder is being used).
Perhaps this is related, I am using aws-cli/1.10.0 and cannot ever get a file like this to be included or excluded: ._3E53F853-DA82-4926-AC21-5C1096FB126C.MP3
aws s3 rm s3://foo/foo2/foo3 --include "*/._*" --exclude "*" --recursive --dryrun
aws s3 rm s3://foo/foo2/foo3 --include "*._*" --exclude "*" --recursive --dryrun
aws s3 sync /foo/foo2/foo3 s3://foo/foo2/foo3 --exclude "*/._*" --delete --dryrun
According to #548, it seems as though the behavior is to prepend each exclude/include pattern with the current working directory. I was able to verify this behavior with this test
Given the directory structure:
test1.txt
test2/
test2.txt
test3/
test3.txt
aws s3 sync . s3://examplebucket --exclude "test2/test3/*"
test3.txt is excluded
aws s3 sync . s3://examplebucket --exclude "test3/*"
test3.txt is not excluded
So, it seems as though paths are always relative to the root of the sync which is a bit unintuitive in my opinion. I think most users would prefer rsync style syntax, where you must prepend the pattern with "/" in order for it to be relative to the root of the sync.
Great detective work kirkmadera. That is quite unintuitive but at least now I can understand what is happening.
I'm using it exactly like that (except I use many "exclude" statements one after the other) and still it won't honor the exclusion.
It's not just "relative paths". If I have mixed data files, such as this, I cannot effectively exclude one while including the other:
dirname/log.1
dirname/log.2
dirname/log.3
dirname/log.foo.1
dirname/log.foo.2
dirname/log.foo.3
As soon as I use any "--include=" statement that would include both, no combination of arguments allows me to exclude the other logs. Combined with the non-intuitive behavior of "--exclude=*" having to come before "--include=[whatever]", it makes for a very confusing operations.
aws s3 sync --exclude=* --include=log.* --exclulde=log.foo.* dirname/ s3://bucket/dirname/
This is has been a major flaw with aws s3 sync
for years, there is no way to exclude dotfiles such as .DS_Store
or directories such as .git
or .DAV
. I have spent hours trying different invocations and the --exclude
argument appears to have no effect at all.
As others have stated, the --exclude
flag should work as it does in rsync. We need to be able to easily indicate file and directory names for exclusion anywhere they might appear in the path being synchronized.
It would be great if you could just follow the rsync conventions. Bonus if it would allow you to specify a rsync-style filter file.
"happy" to see I'm not the only one getting beaten to death by this option… It really does hurt.
I'm also seeing this bug with aws s3 cp --recursive --exclude '.*' localpath s3://bucket/path
. The trouble is I'm trying to exclude a file that's unreadable to the current user but regardless of the exclude patterns the file is still attempted (in a way).
The core problem is that unreadable files turn into warnings before the --exclude
filter is applied.
The file generator and filter instructions are inserted one after another here. When the file generator runs to list available files in the hierarchy, it triggers warnings for unreadable files completely unaware of the further filtering stage that would skip those files altogether.
I suggest maybe FileGenerator
should not call triggers_warning
on the happy code path (like it does in should_ignore_file
), but instead either there needs to be an extra instruction stage to check for file warnings only after filtering, or else FileGenerator
and Filter
should merge into just one instruction.
Can verify it's still an issue...
This is not so much an issue in the code, more so that it is poorly documented. The documentation needs to explicitly state what is in this comment:
No where in the documentation does it state that the position of the exclude and include filters are dependent on the success of the filtering operation. For example:
aws s3 cp mydir s3://mybucket --recursive --exclude "*" --exclude "*/*" --include "*.txt"
Results in copying all txt files in subdirectories since the include filter matches last and will match subdirectories which overrides the exclude of "/".
aws s3 cp mydir s3://mybucket --recursive --exclude "*" --include "*.txt" --exclude "*/*"
Works as intended, subdirectories are excluded since the last match will be FALSE.
My main problem with the way the code is implemented is that there is no way to stop recursion into a directory that is excluded. This results in totally unnecessary deep recursion and long run times. There could be an option such as --excludestop that if matches false skips any further checks of the directory tree.
Also having this issue when trying to exclude any files beginning with ._
I'm trying to use this to sync Avid projects to a server, using the following command: for f in /Volumes/*Projects; do /usr/local/bin/aws s3 sync $f s3://[companyname]-avid.backups/${f\/Volumes\//}/ --exclude "*.lck" --exclude "._*" --exclude "*/SearchData/*" --exclude "*/WaveformCache/*" --exclude "*/Unity Attic/*"; done
All the other filters work as expected, but it's still uploading any files starting with ._
, which is irritating as all of those are just dud files Avid generates when using shared storage
At this point I am convinced that the reason AWS provides defective tools and does not fix them is that S3 and Glacier are more (most?) profitable when people use them to store tons of files which are smaller than the minimum billable increment. Bummer, but not too surprising.
@bithive considering the tool is 100% open source, and you could modify it yourself to fix observed issues, I absolutely cannot agree with your thesis.
One thing we should keep in mind and mentioned here and referring to this issue:
For that, I used the --exclude="src/*" parameter (I also make sure the aws command is called from my $HOME directory, since I learned that the filters start matching from the current directory -- more details can be found in the #1588 issue).
Maybe using something like **/.*
would work in your cases ?
I use it in a cron job and hidden files aren't sent to S3. Hoping it would help in your case
This is terribly confusing, but I kind of figured it out. I recommend you running with the Debug and Dryrun flags to see what's going on, for example:
aws s3 sync <source> <dest> --debug --dryrun --exclude ...
In my case:
aws s3 sync D:\test\ s3://bucket/ --dryrun --debug --exclude "Baks/*"
You will see something like this:
awscli.customizations.s3.filters - DEBUG - d:\test\Baks\1.rtf matched exclude
filter: d:\test\Baks\*
That means that for each exclude, the origin path is prepended and that is used to match the destination path.
Now, if I run the same on the root of the drive:
aws s3 sync D:\ s3://bucket/ --dryrun --debug --exclude "Baks*"
That's where it gets complicated. The reason is that the path being matched will have 2 backslashes, for example:
d:\\Baks\.... did not match clude filter: d:\Baks*
From my tests:
//folder*
, then the path is matched against d:\\folder*
and \\folder*
, which will match.//folder/*
then the path is matched against \\folder\*
and \\folder\*
(yes, 2 times the exact same pattern), which will not match.\\
or //
at the beginning, it won't match as the path will contain them.*/folder/*
which will match, but will also match that folder anywhere.Sorry if it's not clear, the behavior is very unpredictable. The best bet is to enable --debug and try it by yourself. Also, all these tests are on Windows, it probably changes for Linux.
It's not just "relative paths". If I have mixed data files, such as this, I cannot effectively exclude one while including the other:
dirname/log.1 dirname/log.2 dirname/log.3 dirname/log.foo.1 dirname/log.foo.2 dirname/log.foo.3
As soon as I use any "--include=" statement that would include both, no combination of arguments allows me to exclude the other logs. Combined with the non-intuitive behavior of "--exclude=*" having to come before "--include=[whatever]", it makes for a very confusing operations.
aws s3 sync --exclude=* --include=log.* --exclulde=log.foo.* dirname/ s3://bucket/dirname/
--exclude= worked for me, not able to achieve exclusion of entire folder using --exclude '' and include some specific files using --include 'main.js' --include 'main.css'
None of the above commands (whatever cwd is) will exclude /bin from being synch'ed Using
"*/bin/*"
would work, but it's unwanted since it would exclude other subdirectories likehome/foo/comp/bin/