Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.54k stars 2.77k forks source link

HTTP error creating a file from Python #32123

Closed danielIgnis closed 12 months ago

danielIgnis commented 1 year ago

Description Hello. I am trying to upload the contents of a file to our Azure Data Lake Storage and I am getting the following error on the line "file = cls.__directory_clients[directory_path].create_file(file=file_name)"

image

I do not know what to do because I do not handle the HTTP requests. I attach the example code:

`class DataLakeStorageClient: """Class for working with Azure DataLake Storage"""

__file_system_client: FileSystemClient | None = None
__directory_clients: Dict[str, DataLakeDirectoryClient] = {}

@classmethod
def connect(cls):
    """Connect to Azure DataLake Storage"""
    if cls.__file_system_client is None:
        cls.__file_system_client = DataLakeServiceClient(
            account_url=AzureDataLake.SAS_URL,
            credential=AzureDataLake.TOKEN,
        ).get_file_system_client(AzureDataLake.FILESYSTEM_NAME)

@classmethod
def upload(
    cls,
    directory_path: str,
    file_name: str,
    serialized_data: bytes | str,
) -> None:
    if directory_path not in cls.__directory_clients:
        cls.__directory_clients[
            directory_path
        ] = cls.__file_system_client.get_directory_client(directory_path)

    file = cls.__directory_clients[directory_path].create_file(file=file_name)
    file.append_data(data=serialized_data, offset=0, length=len(serialized_data))
    file.flush_data(len(serialized_data))`

Thank you in advance!

github-actions[bot] commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

swathipil commented 1 year ago

Hi @danielIgnis - thanks for opening an issue! We'll take a look asap!

jalauzon-msft commented 1 year ago

Hi @danielIgnis, could you please check the format of your account URL (AzureDatalake.SAS_URL in your sample)? For the datalake SDK, you want to be providing an account URL pointing to the DFS endpoint like https://<account_name>.dfs.core.windows.net/. My guess is maybe you providing a URL pointing to Blob endpoint, blob.core.windows.net?

danielIgnis commented 1 year ago

Hello @jalauzon-msft. Yes, you are absolutely right, we are using a URL pointing to Blob endpoint (blob.core.windows.net). What is the difference between both URLs? Because, as I understand it, the url is pointing to our Data Lake. Thank you!

danielIgnis commented 12 months ago

Another question we have is what are the differences and advantages between the azure-storage-file-datalake and azure-storage-blob libraries? Because in the past we used azure-storage-blob to upload files to an Azure Storage, but now we have seen that we can use the same library to upload data to the Data Lake if the URL is like blob.core.windows.net. Therefore, we would like to know the differences and advantages of using one type of URL vs the other, as well as the libraries.

vincenttran-msft commented 12 months ago

Hi @danielIgnis, glad you were able to reach a resolution to your original issue. To answer your follow-up questions, the main difference between azure-storage-file-datalake and azure-storage-blob is that DataLake supports hierarchical namespace and so while Blobs is great for mostly unstructured or semi-structured data in a flat namespace, DataLake lends itself to better organization or structure due to having this concept of hierarchical namespace (and thus, concepts such as files and directories).

Strictly speaking, you should only use the Blob endpoint when using the Blobs package (.blob.core.windows.net) and likewise you should only use the DFS endpoint when using the DataLake package (.dfs.core.windows.net).

While some functionality may work when using mismatched endpoints and packages, there is no guarantee and these are not officially supported scenarios. Thus, I would strongly recommend using the corresponding endpoints with the matching package. Especially if you are using an account with HNS enabled (which you can check from your Azure Portal).

For more information on DataLake, please see the following link here! With that being said, I believe the original question was addressed, and so I will continue with the closure of the issue. Hopefully this response also clarifies all of your follow-up questions!

Thanks!