allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.73k stars 657 forks source link

Problem creating datasets with Azure storage when multi file #1285

Open SamGalanakis opened 5 months ago

SamGalanakis commented 5 months ago

I am using Azure for storing everything and the authentication works e.g.

from clearml import Dataset
dataset = Dataset.create(dataset_name="sanity_test", dataset_project="LOGOCube")

dataset.add_files("README.md")
dataset.upload(
)
dataset.finalize()

Works fine and I see it on Azure. Also tried with larger (single) files with no issue.

But when I try to run the provided example which has a folder of files:

# Download CIFAR dataset and create a dataset with ClearML's Dataset class
from clearml import StorageManager, Dataset

manager = StorageManager()

dataset_path = manager.get_local_copy(
    remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
)

dataset = Dataset.create(
    dataset_name="cifar_dataset", dataset_project="dataset_examples"
)

# Prepare and clean data here before it is added to the dataset

dataset.add_files(path=dataset_path)

# Dataset is uploaded to the ClearML Server by default
dataset.upload()

dataset.finalize()

It logs the folder, some metadata and then starts throwing errors:

python clearml_dataset_creation.py 
ClearML results page: https://app.clear.ml/projects/14ccfc7a20f54b02b2539ba3b36da47c/experiments/bfeb2290824d4adbb2b67e22236ea53d/output/log
ClearML dataset page: https://app.clear.ml/datasets/simple/14ccfc7a20f54b02b2539ba3b36da47c/experiments/bfeb2290824d4adbb2b67e22236ea53d
Generating SHA2 hash for 8 files
100%|█████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 72.26it/s]
Hash generation completed
2024-06-19 09:11:01,093 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /dataset_examples/.datasets/cifar_dataset/cifar_dataset.bfeb2290824d4adbb2b67e22236ea53d/metrics/HTML/readme.html/HTML_readme.html_00000000.html (403): <?xml version="1.0" encoding="utf-8"?>
<Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:ed761ce3-201e-0024-4428-c29c5e000000
Time:2024-06-19T09:10:58.9893281Z</Message><AuthenticationErrorDetail>Authentication scheme Bearer is not supported in this version.</AuthenticationErrorDetail></Error>
2024-06-19 09:11:01,093 - clearml.metrics - WARNING - Failed uploading to https://clearmltest.blob.core.windows.net/clearml (Failed uploading object /dataset_examples/.datasets/cifar_dataset/cifar_dataset.bfeb2290824d4adbb2b67e22236ea53d/metrics/HTML/readme.html/HTML_readme.html_00000000.html (403): <?xml version="1.0" encoding="utf-8"?>
<Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:ed761ce3-201e-0024-4428-c29c5e000000
Time:2024-06-19T09:10:58.9893281Z</Message><AuthenticationErrorDetail>Authentication scheme Bearer is not supported in this version.</AuthenticationErrorDetail></Error>)
2024-06-19 09:11:01,094 - clearml.metrics - ERROR - Not uploading 1/5 events because the data upload failed
Uploading dataset changes (8 files compressed to 162.15 MiB) to azure://clearmltest.blob.core.windows.net/clearml

What is the issue here?

I am using the following config:

sdk {
    development {
        default_output_uri: azure://clearmltest.blob.core.windows.net/clearml/
    }
    azure.storage {
        containers: [
            {
                account_name: ${AZURE_STORAGE_ACCOUNT}
                account_key: ${AZURE_STORAGE_KEY}
                container_name: clearml
            }
        ]
    }
}
jkhenning commented 5 months ago

Hi @SamGalanakis , this seems perhaps to be related to the azure python package version - can you share the python packages versions you're using?

SamGalanakis commented 5 months ago

Hi @jkhenning this is the pip freeze

requirements.txt

SamGalanakis commented 5 months ago

Also I see that it does store the main data but fails on some metadata / auxillary files.

RequestId:48002e24-301e-005a-022e-c20c19000000
Time:2024-06-19T09:52:41.5150691Z</Message><AuthenticationErrorDetail>Authentication scheme Bearer is not supported in this version.</AuthenticationErrorDetail></Error>)
2024-06-19 09:52:43,794 - clearml.metrics - ERROR - Not uploading 1/5 events because the data upload failed
Uploading dataset changes (8 files compressed to 162.15 MiB) to azure://clearmltest.blob.core.windows.net/clearml
File compression and upload completed: total size 162.15 MiB, 1 chunk(s) stored (average size 162.15 MiB)
SamGalanakis commented 5 months ago

@jkhenning Any update on this?

jkhenning commented 5 months ago

Hi @SamGalanakis,

I've looked around for this error, it seems related to either a misaligned system clock, or to some headers missing from the request (even though I found no change in documentation regarding this).

Since there was no chance in the ClearML SDK code handling this process, I would first try to downgrade the azure-blob-storage version (perhaps try 12.0.0) and see if it helps