Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
602 stars 217 forks source link

azcopy sync does not follow symlinks #718

Open ipepe opened 4 years ago

ipepe commented 4 years ago

Which version of the AzCopy was used?

10.3.1

Which platform are you using? (ex: Windows, Mac, Linux)

Docker container (Mac/Linux) FROM microsoft/dotnet:latest

What command did you run?

azcopy sync /source http://redacted?SASKEY

What problem was encountered?

INFO: Skipping over symlink at /source/redacted because --follow-symlinks is false

How can we reproduce the problem in the simplest way?

Not sure yet. Will try to prepare repository.

Have you found a mitigation/solution?

No. I have tried adding --follow-symlinks but it says that flag is unknown.

ipepe commented 4 years ago

Could it be related to #473 ?

JohnRusk commented 4 years ago

473 was the incorrect processing of symlinks. That code is now fixed, but its only used in copy, not sync. The follow-symlinks flag is not currently supported for sync.

@zezha-msft may be able to comment on whether it is expected to be added for sync one day.

zezha-msft commented 4 years ago

Hi @ipepe, symbolic links are not yet supported for the sync command, as we have to clarify the round-trip story first, e.g. if the user syncs back from remote to local, how should the files be treated.

zezha-msft commented 4 years ago

@ipepe perhaps you could share some insights with us? What are your expectations for the round-trip scenario?

ipepe commented 4 years ago

Huh. I'm not sure. My goal with using azcopy was for my offsite backup. Good backup tool should be able to recover from on-site failure. But here's a problem, because data I backup is from docker containers and I treat them like black boxes so I don't know what's inside and how that data is structured.

Edit: One of these docker containers with a lot of symlinks is gitlab. You could just take approach to upload data folder from gitlab container and download it back and that container should work (with a little minor detail that gitlab has constant file permission problems).

zezha-msft commented 4 years ago

Hi @ipepe, so when we download the data back, it's ok to not recreate the symbolic links, right?

ipepe commented 4 years ago

In perfect world, the data sent, should be same as data received. I understand that complexity of this technical challenge can be hard if not impossible to overcome. I personally will archive all my data first using tar, and then send it through azcopy to keep my data in check. I sacrifice a lot but by doing this I'm sure that once I need that data back I will get exactly what I sent.

zezha-msft commented 4 years ago

Thanks @ipepe for the insights, I'll bring this feature request to our PM.

adreed-msft commented 4 years ago

Right, copy has --follow-symlinks but not Sync.

So, partly this is an issue with our error messages, and this is also somewhat an issue with Sync. @zezha-msft this should just be a case of adding the flag to sync, since it exists within the enumerators. I left this as a TODO during the copy refactoring.

adreed-msft commented 4 years ago

That, and testing.

gar1t commented 3 years ago

The lack of symmetry here is surprising and I would have classified this as a bug rather than a missing feature. That someone (above) is tarring files with sync suggests that sync is broken.

EraYaN commented 1 year ago

For our use case (publishing local files to Azure Files periodically), just following symlinks would be good enough, without a round trip story. Especially when symlinks point to outside the directory structure that is being synced (as a "view" of a much larger filesystem for example).

Currently we use rsync --copy-links the man page gives this: --copy-links transform symlink into referent file/dir and that also does not have a round trip story.

ld0614 commented 3 weeks ago

Any update on this feature request? I have a (Windows) mount point which I would like to be able to copy data from. Ideally I'd like the mount point to simply be considered as a directory so that I could reverse the sync and copy back into a mount point but thats less important as there are other ways to get the data back.