az download-batch tries to create directory on the root filesystem

katcipis commented 6 years ago

Environment summary

Problem happens with az installed through pip on arch linux and on Docker.

Version:

azure-cli (2.0.28)

acr (2.0.21)
acs (2.0.27)
advisor (0.1.2)
appservice (0.1.28)
backup (1.0.6)
batch (3.1.10)
batchai (0.1.5)
billing (0.1.7)
cdn (0.0.13)
cloud (2.0.12)
cognitiveservices (0.1.11)
command-modules-nspkg (2.0.1)
configure (2.0.14)
consumption (0.2.2)
container (0.1.19)
core (2.0.28)
cosmosdb (0.1.19)
dla (0.0.18)
dls (0.0.19)
eventgrid (0.1.10)
extension (0.0.9)
feedback (2.1.0)
find (0.2.8)
interactive (0.3.16)
iot (0.1.17)
keyvault (2.0.19)
lab (0.0.17)
monitor (0.1.2)
network (2.0.24)
nspkg (3.0.1)
profile (2.0.19)
rdbms (0.0.12)
redis (0.2.11)
reservations (0.1.1)
resource (2.0.24)
role (2.0.20)
servicefabric (0.0.10)
sql (2.0.22)
storage (2.0.26)
vm (2.0.27)

Python location '/usr/bin/python'
Extensions directory '/root/.azure/cliextensions'

Python (Linux) 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]

Im trying to download some files stored on azure blob storage using:

az storage blob download-batch --destination . --source --account-name --account-key

My files on this container all start with "/", like "/test/file1" and "/test/file2". When I download them I got this error:

File "/usr/lib/python3.6/os.py", line 220, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '/test'

It seems to be trying to create a directory on my root filesystem named "/test" (the prefix of the files on the container).

Executing inside a container I confirmed that the directory was actually being created on the root filesystem /test.

Using a --destination with an absolute path does not change the behavior, same problem.

troydai commented 6 years ago

@williexu do you want to take a look?

williexu commented 6 years ago

@katcipis why do your blob names start with the character '/'? A blob name starting with a path seperator ('/') is interpreted as an absolute path, hence why the destination is ignored and the blobs are being downloaded into your root. I'll ask service team about what the behavior should be, but this might not change.

katcipis commented 6 years ago

@williexu On the idea that using "/" as separators could allow the BLOB storage to be used as an hierarchical filesystem it made sense to start with "/" since all filesystems start with a root "/". Interpreting the source path as an absolute path is what I intended, the remote dir that I'm downloading is the absolute path "/test/somedir", the problem is ignoring the destination-path argument. I don't see any relation between the source path being absolute or not and ignoring the target/destination path.

It is like:

cp -r ./somedir /otherdir

To work ok. but:

cp -r /etc/somedir /otherdir

To not work because the source path is absolute. The source path is just the address of what I want to copy.

The problems with path's starting with "/" is something that I just noticed later on az cli (never had problem on aws s3 with this), I'm not sure if I failed to check this on some docs, or if it is something not very well documented. If blob files must not start with a "/" perhaps it would be better to not even allow files to be upload with this kind of naming. But again, being informed that "/" can be used as a separator to make it easier to organize files but not allow "/" on the start of filename is odd. It was intuitive to me that I would be able to do that and them later list "/" to get all files/dir's on the root.

williexu commented 6 years ago

@katcipis I completely understand the sentiment. However, consider how one would differentiate between a blob with the name '/blobname' and 'blobname'. -If download-batch was used, should one override the other arbitrarily based on which was downloaded first? -Another possible solution would be to create a separate (hidden) directory in the destination folder, but then again, there is always the possibility that that hidden directory, whatever we name it, will already exist.

@seguler to comment, for the record, storage explorer attempts to download to root.

katcipis commented 6 years ago

@williexu This is the problem of giving just half of a filesystem API to the user. Without enforcing some uniformity (like always starting with "/" to indicate the root, or never doing it) the idea of working with Azure Blob storage as a hierarchical filesystem will not work properly.

It does not seem to be what Azure wants since there is parameters like "--destination-path" clearly indicating that you can work with file paths. But then you must be aware of odd pitfalls like they can't start with "/". If not starting with "/" is mandatory for everything to work properly, it seems like a good idea to internally transform the paths to a canonical blob way to store things (like removing the leading /), or at least erroring and not allowing the client to upload things through the CLI with names that are invalid to use as a path.

Another problem that you will have that is unrelated to root is when a file is called "dir/dir2/file" and another one is called "dir/dir2//file", if this is allowed as blob filename, how will you download this ? I agree that when thinking about a filesystem this is ambiguous, but it is the tool (the cli) that is allowing the client to drive the system to a inconsistent state. I was able to upload files with naming that broke the dashboard, I don't know.. IMHO this does not seem good.

Regarding removing the "bug" label of the issue... what ? The tool is downloading file at the root filesystem...this is not a bug ? An example, let's say I have two files:

/test/file
test/file

I use download-batch to download them at "/mydir". The results of this operation (thinking on root is equal nothing logic) are:

/mydir/test/file
/mydir//test/file

There is ambiguity and overwriting of files, but no reason at all to download things at the root filesystem of the client, so I'm pretty scared that the bug label has been removed.

And to handle the ambiguity, well the easiest way for me seems to provide a uniform API that works as a fs that people are used to and internally convert the paths to how blob storage actually works (since I was able to even break the dashboard, I want to help people to avoid that). This way even listing "/" works as it works on any filesystem (at least the ones that I know, it is not that much actually =P). It is not a complete solution because files can be upload through the API and other kinds of clients, so a definitive solution would have to be developed on the API that all clients use (including the cli). The best way to solve ambiguity usually is to avoid it settling in on the first place.

I understand that to implement a hierarchical filesystem like API on top of a flat storage is not trivial, but I really can't agree that at least the "downloading to root filesystem" behavior is NOT a bug. And the rest of the behaviors, like allowing ambuities to creep in, at least are on the "not favoring" the client class, the tool is not making it hard to do bad choices (and I really don't think that assuming that a path like "/dir1/file" will work fine is a stretch, but I may be wrong).

Of course this solution assumes that it is desired to provide a hierarchical filesystem on top of a flat one all the times, if the idea is to also provide the flat storage feature where a filename can always have any name it is simply not possible for both to coexist without generating ambiguities and odd corner cases. If this is the case it would be great to at least provide clear errors on the cli, like when you download a dir and a situation like this is found provide an error like "can't download from blob storage as directories because files name are completely messed up". In my situtation there are no clear errors (and sadly some downloading of stuff on the root filesystem) and no documentation informing me of anything (docs about this on the help of the cli would be great).

troydai commented 6 years ago

@katcipis thank you for the feedback and your insightful clarification of your point. It is very valuable to us and we treat your feedback seriously. We acknowledge the problem here: download file to the root is not ideal.

We're actually seeking a solution to this problem. However, we'd like to thoroughly study this problem and coordinate with other storage clients as well as the storage service. For example, as you suggested, maybe allowing / at the beginning of the blob name is something we should take a closer look. This is something we must work with the storage service team before the client can take any actions. We also want all the clients, namely PowerShell, StorageExplorer, Azure Portal, as well as CLI to behave accordingly so the problem is addressed with the same solution across the board. We may limit the CLI's capability to download to root for the short-term to prevent this misbehavior.

It may take time and effort beyond CLI team's scope, but we will drive to solve this.

Again, thank you for taking time providing us the feedback. We value your insight and appreciate you using our service. I'm adding the bug label back to this issue, and we will post back when we have a solution.

seguler commented 6 years ago

I don't understand - why would you upload your files to the following URL: https://myaccount.blob.core.windows.net/mycontainer//myblob

This won't happen if you have initially started from a local file system, used Azure CLI, Storage Explorer, PowerShell and uploaded a directory structure to Blob storage. In this case, you must have used one of the SDKs or the one-off file uploads of CLI/PSH, and uploaded your files with /myblob name. Unfortunately this is not a supported name in file systems. Its Blob URL also ends up having double slashes as well.

In my opinion, expecting this to work is unreasonable.

katcipis commented 6 years ago

@seguler if "this is not a supported name in file systems" why I can pass it as a parameter on the cli and it is accepted ? Also the statement that this not a supported name in "file systems" is simply not true, there are filesystems where using as a file path a name that starts with "/" is valid, as an example is the linux that I'm using right now. It may not always be the case, but saying that it is not a supported names on file systems (plural) does not seem to be correct.

I'm just expecting to tool to help me with what is acceptable or not. If you think that it is unreasonable to any human being to imagine that running:

az storage blob upload -n "/some/file"

Is completely absurd and idiotic than just leave everything as it is right now, sounds great =)

katcipis commented 6 years ago

Thanks @troydai

seguler commented 6 years ago

Looking at the command you ran one more time, I kind of agree that one may find uploading file with an absolute path as such: "/some/file"... You could expect CLI to upload it to domain.com/mycontainer/some/file rather than to domain.com/mycontainer//some/file.

Re. Linux - it is still not a supported file name. '/' is a reserved character for directory separation; cannot be part of the file name. However in an object store, object name can contain '/'.. So by dropping the leading '/' in CLI, we may actually break the object storage behavior just to be compatible with file system behavior which I think is not a good idea...

katcipis commented 6 years ago

@seguler On "Linux - it is still not a supported file name", not talking about filenames here, talking about file paths which is the address of a file relative to the root of the filesystem (some file systems also support addressing files relative to the current working directory, which does not seem to apply on this case).

Discussing this is confusing, I see your point where you think about it just as a file name since it is a flat storage, but the features of download-batch indicates that the file name is actually not that, when they have "/" they are actually filepaths (even directories will be created on the host according to the "/" in the filename, so effectively it is representing a file path not a file name).

Azure will have to decide if it wants to deliver some file system like features on its blob storage or not. Remembering tha AWS does support it and it was always been a pleasure to work with it =) (at least for me).

williexu commented 6 years ago

@katcipis I've created a fix for this issue that will ignore the starting path separator and normalize blob names. A error will be thrown if someone attempts to download blobs that normalize to the same path- this shouldn't affect your scenario.

Feel free to try out the new fix using dev-setup, our edge builds, or docker before this is released.

Azure / azure-cli

az download-batch tries to create directory on the root filesystem #5837

Environment summary