Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
613 stars 222 forks source link

azcopy sync --exclude-path not working #796

Open matevzg opened 4 years ago

matevzg commented 4 years ago

Which version of the AzCopy was used?

azcopy 10.3.3

Which platform are you using? (ex: Windows, Mac, Linux)

Windows

What command did you run?

azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path="f1" --exclude-path="f2 f3" --exclude-path="f4" --recursive=true --cap-mbps=60 --delete-destination=true --log-level=DEBUG

What problem was encountered?

Folders "f1", "f2 f3", "f4" were not excluded from scanning and syncing.

How can we reproduce the problem in the simplest way?

Retry the upper command.

Have you found a mitigation/solution?

No.

matevzg commented 4 years ago

This also does not work: azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path="f1;f2 f3;f4"

Nor does this: azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path=f1

Or this: azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path="f1"

Local structure: "V:\Folder1" "V:\Folder1\f1" "V:\Folder1\f1..." "V:\Folder1\f2 f3" "V:\Folder1\f2 f3..." "V:\Folder1\f4" "V:\Folder1\f4"...

Always scans complete tree structure. Strange, this is. ¯_(ツ)_/¯

JohnRusk commented 4 years ago

@zezha-msft Did you see this one?

zezha-msft commented 4 years ago

Hi @matevzg, sorry for the delayed reply.

Unfortunately, I wasn't able to repro this on my end, the exclude-path flag is working as expected.

Could you please clarify the observed behavior? Were the excluded folders still getting synced? They always get scanned, but they shouldn't be replicated to the destination.

qudiransari commented 4 years ago

Hi @matevzg,

I can see some syntax issue. Could you please try the below syntax: "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path f1;f2 f3;f4

ramuadapa commented 4 years ago

Command Executed on Linux: azcopy copy "/mnt/" "https://prodneuazcopyst.blob.core.windows.net/xxxxx/[SAS] --recursive=true --follow-symlinks=false --exclude-path /mnt/.snapshot/

Still scanning the .snapshot folder.


INFO: Scanning... INFO: Skipping over symlink at /mnt/.snapshot/HANAEQCPOCSNAP-031920201401/interfaces/SAPCD/download_migration because --follow-symlinks is false INFO: Skipping over symlink at /mnt/.snapshot/HANAEQCPOCSNAP-031920201401/interfaces/SAPCD/software/51050882_export/51050882_EXP1_part1.exe because --follow-symlinks is false INFO: Skipping over symlink at /mnt/.snapshot/HANAEQCPOCSNAP-031920201401/interfaces/SAPCD/software/51050882_export/51050882_EXP1_part2.rar because --follow-symlinks is false

JohnRusk commented 4 years ago

I think it will work as desired if you remove the trailing / from the exclude parameter. (Maybe we should automatically remove those).

I think that what you've used is being interpreted by the tool as "don't scan any directories inside the snapshot folder".

ramuadapa commented 4 years ago

Tried, --exclude-path "/mnt/.snapshot" and "--exclude-path=/mnt/.snapshot" - Both are not working. exclude path not working in azcopy.

JohnRusk commented 4 years ago

Thanks for the test results @ramuadapa

JohnRusk commented 4 years ago

Depending on whether we can reproduce this, and how it gets triaged if we do, we might be able to fix this in release 10.4. Maybe...

adreed-msft commented 4 years ago

@ramuadapa just a thought, but could you try not having the root folder on that path?

ex. --exclude-path=.snapshot

IIRC we check for a prefix on a relative path for exclude path.

adreed-msft commented 4 years ago

(that being said though, I've seen users do both ways, so this could arguably be a usability complaint)

daweins commented 4 years ago

Upvote for the relative & absolute path - I just saw this at a customer as well.

ramuadapa commented 4 years ago

(that being said though, I've seen users do both ways, so this could arguably be a usability compl

@ramuadapa just a thought, but could you try not having the root folder on that path?

ex. --exclude-path=.snapshot

IIRC we check for a prefix on a relative path for exclude path.

Even tried this before updating the blog, with relative path also, we are seeing issues.

JohnRusk commented 4 years ago

@adreed-msft, @zezha-msft , @nakulkar-msft Any thoughts?

subhakarthi711 commented 4 years ago

./azcopy sync "I:\final\1001\1001001" "container?sv=tocken" --delete-destination=true --include-pattern=".dwg;.pdf" --exclude-path="1/Obsolete;1/Quality;1/Quote;2;3;4"

--exclude-path string Exclude these paths when copying. This option does not support wildcard characters (*). Checks relative path prefix(For example: myFolder;myFolder/subDirName/file.pdf). When used in combination with account traversal, paths do not include the container name.

Shaboogity commented 4 years ago

Just curious if there's been any headway on this issue. I'm running into this same problem but I'm using the copy mode instead of sync.

My source is "H:\BackupRoot\SiteBackup" and and I want to exclude "H:\BackupRoot\SiteBackup\SQLBackup"

I've tried the following combinations with no luck: --exclude-path="SQLBackup" --exclude-path="H:\BackupRoot\SiteBackup\SQLBackup\database.mdf" --exclude-path="SiteBackup\SQLBackup\database.mdf" --exclude-path="SiteBackup\SQLBackup"

No errors, and I see it's interrupted in the Job-Command line of the log. Using AzCopy 10.4.3 x64 on Windows.

berguner commented 4 years ago

I am using the Azcopy v10.5.0 on Windows x64 and it is still not working. I tried using both full path and prefixes with no luck.

We are planning to use Azcopy in a production environment and this feature is urgently needed. I would be grateful if you can fix this issue.

nakulkar-msft commented 4 years ago

Hi @berguner, can you post the command you ran, and the AzCopy log file? Please make sure you redact any SAS tokens used in the command.

berguner commented 4 years ago

I am trying to exclude a subfolder called "Thumbnail_Images" and I tried:

azcopy.exe sync $relative_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name/Thumbnail_Images azcopy.exe sync $relative_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name/Thumbnail_Images azcopy.exe sync $relative_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name\Thumbnail_Images azcopy.exe sync $relative_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name\Thumbnail_Images azcopy.exe sync $relative_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path Thumbnail_Images azcopy.exe sync $relative_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path Thumbnail_Images azcopy.exe sync $full_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name/Thumbnail_Images azcopy.exe sync $full_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name/Thumbnail_Images azcopy.exe sync $full_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name\Thumbnail_Images azcopy.exe sync $full_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name\Thumbnail_Images azcopy.exe sync $full_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path Thumbnail_Images azcopy.exe sync $full_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path Thumbnail_Images

And none of the above worked. The log files don't show much because the "Thumbnail_Images" were already uploaded and there is nothing to sync. I can tell that the "Thumbnail_Images" is still being scanned both on the source and destination based on the number of files being scanned.

I am using the cp command below for the time being but it is not ideal because it only compares the timestamps. In the logs of the cp command, I can see that the number of scanned files don't include the number of files in the "Thumbnail_Images" folder. azcopy.exe cp $full_folder_path "$blob_container/$sas" --recursive --overwrite isSourceNewer --exclude-path Thumbnail_Images

nakulkar-msft commented 4 years ago

@berguner The exclude-path uses relative path, and I'd expect 'azcopy cp src dst --recursive --exclude-path Thumbnail__Images' to work. Can you verify through AzCopy logs that it is not enclosed in quotes when passed to AzCopy as in here: Job-Command copy /home/nakulkar https://myaccount.blob.core.windows.net/container?SAS --exclude-path="NoQuotesHere" --recursive The AzCopy logs are in $HOME/.azcopy. I'll have a look after you post the logs here.

gfaessler commented 3 years ago

azcopy version 10.9.0

Same issue here. Tried relative path, folder name. All the folders/files excluded are replicated. My structure is as follow in the container:

/site1/App_Data/ClientDependency /site2/App_Data/ClientDependency ...

I would like to exclude all "App_Data/ClientDependency" folders.

azcopy sync 'source' 'destination' --recursive --exclude-path='App_Data/ClientDependency' > doesn't work azcopy sync 'source' 'destination' --recursive --exclude-path='/App_Data/ClientDependency' > doesn't work azcopy sync 'source' 'destination' --recursive --exclude-path='ClientDependency' > doesn't work azcopy sync 'source' 'destination' --recursive --exclude-path='site1/App_Data/ClientDependency' > works

It looks like the exclude-path must be specified starting from root.

svivekiyer commented 3 years ago
  1. Its relative Path to be mentioned in exclude-path command.

Source= site Destination: site

Folder Structure:-- site/site1/App_Data/ClientDependency site/site2/App_Data/ClientDependency site/site3/App_Data/ClientDependency

Try this command:-- ./azcopy sync Source "https://StorageaccountName.blob.core.windows.net/containerName/site/?sv=A....D" --put-md5 --recursive --exclude-path 'site1/App_Data/ClientDependency;site2/App_Data/ClientDependency'

This will only Sync Site3 Folder to Azure Container.

gfaessler commented 3 years ago

Thank you for your answer. Yes using relative path it's working fine. My issue is that I have an unknown number of sites and I would like to define a single exclude-path rule which would exclude a folder for all sites.

Wildcards are not supported, exclude-pattern applies only to files and --list-of-files is not supported on sync so I guess my only chance would be to build a powershell script which goes trough the structure and calls the azcopy sync commands on folder I want to sync

zezha-msft commented 3 years ago

To clarify, exclude-path works on relative paths under the given source. And exclude-pattern is for file names only. We'll try to clarify the docs to avoid this confusion.

--exclude-path string Exclude these paths when copying. This option does not support wildcard characters (*). Checks relative path prefix(For example: myFolder;myFolder/subDirName/file.pdf). When used in combination with account traversal, paths do not include the container name.

--exclude-pattern string Exclude these files when copying. This option supports wildcard characters (*).

@gfaessler we understand that there's not enough flexibility here to accommodate scenarios like yours. We were thinking that perhaps providing some kind of include-regex and exclude-regex may help, it'd be used over the entire relative path (under the source root) of each file. Please let us know if you have any feedback about that idea. Thanks.

gfaessler commented 3 years ago

@zezha-msft Providing include/exclude path regex would definitely be a useful feature to handle this kind of scenario. Otherwise supporting wildcard characters in exclude-path would also do the job in my scenario.

sausag3 commented 3 years ago

Just tried using AzCopy as Storage Explorer wasn't flexible enough. After 30mins of struggling to make exclude path work I ended up here. My scenario is I have a complex hierarchy several layers deep and at the deepest levels there's a collection of folders and I want to exclude one of those folders (that share a common name) from a sync from the blob to a local drive. I'd hate to have to manually specify each and every folder explicitly to exclude. Something simpler like the glob syntax ( /Folder/ ) or just skip any folder that matches the string in the exclude path would be perfect (and also to expose that in Storage Explorer too eventually too)

basnijholt commented 1 year ago

Just ran into this issue after struggling with this option.

The documentation would really benefit from the additional text:

--exclude-path must be the full path without the container name

@zezha-msft, who would need to be convinced in order to prioritize to improve this command to have relative paths?

JohnRusk commented 1 year ago

@adreed-msft ^

arunim2405 commented 1 year ago

does --exclude-regex solve this?

danielcgonzalez commented 1 year ago

exclude-path is not working properly, when will be fixed?

stephanadler1 commented 6 months ago

@JohnRusk and @zezha-msft: feel free to contact me internally via teams if you need more details. @JohnRusk has my alias.

I have a consistent repro of this issue on Windows 10/11.

I've recently forced all my PowerShell scripts to use PowerShell Core (pwsh.exe) instead of PowerShell Desktop (powershell.exe) and suddenly I had this problem as well where excluded files and folders are being uploaded through AzCopy.

Interestingly, when I took the command line that pwsh prints out and use it in a cmd.exe or powershell.exe based shell, everything works as expected, excluded files and folders are still excluded. Checking my version history showed that the files started showing up in the storage account after I switched from powershell.exe to pwsh.exe.

Here is the code to reproduce the problem and a simplistic mitigation.

My scripts invoke AzCopy in PowerShell as follows:

& azcopy sync C:\somefolder "https://storageaccount/somefolder?sastoken" --exclude-pattern="web.config"

I can mitigate the problem in pwsh.exe if I pipe the invocation of AzCopy through cmd.exe, like so:

& cmd.exe /c "azcopy sync C:\somefolder `"https://storageaccount/somefolder?sastoken`" --exclude-pattern=`"web.config`""

Maybe others on the thread can also check which shell they are using and possibly corroborate that the difference in behavior is related to the type of shell that is being used.

The versions used for the testing:

Shell Executable Version
PowerShell Core pwsh.exe 7.4.2
PowerShell Desktop powershell.exe 5.1.19041.4291
stephanadler1 commented 6 months ago

I can answer my own comment. If you're using PowerShell Core with a version ≥ 7.3 the default behavior for escaping/encapsulating command line arguments has changed! It is described in PSNativeCommandArgumentPassing. The change became mainstream with version 7.3.

The reason calling azcopy.exe through cmd.exe is working because on Windows machines running PowerShell Core and cmd.exe is being invoked through the call operator, the behavior is automatically switched to the legacy mode.

When azcopy.exe is invoked directly the new behavior is used, which leads to the issues (in my examples above).

JohnRusk commented 6 months ago

f you're using PowerShell Core with a version ≥ 7.3 the default behavior for escaping/encapsulating command line arguments has changed! It is described in PSNativeCommandArgumentPassing. The change became mainstream with version 7.3.

Wow, I did not know that!