Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.36k stars 2.71k forks source link

Unable to download file from output of job using mlclient - Azure python sdk v2 #34005

Closed taruntadikonda closed 2 months ago

taruntadikonda commented 5 months ago

Describe the bug Unable to download a file from the outputs which are generated after job is completed.

ml_client.jobs.download(name=job_name)

Using the above code I am able to download all the files which are there in the outputs. And the azure blob path from where it is downloaded is azureml://datastores/workspaceartifactstore/ExperimentRun/dcid.job_name

When I mention a specific file to download like ml_client.jobs.download(name=job_name,output_name=file_name) then is looking at different data store that is azureml://datastores/workspaceblobstore/paths/azureml/job_name/file_name, But actually the file is in different location.

kristapratico commented 5 months ago

@taruntadikonda thanks for your feedback, @azureml-github can you take a look at this issue?

github-actions[bot] commented 5 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

banibrata-de commented 5 months ago

What type of job is this? Also could you share the debug logs for the download call?

For the SDK, you can enable debug logs by following these steps:

Make sure that when the ml_client is instantiated, the logging_enable parameter is set to "True". For example: ml_client = MLClient(..., logging_enable=True) Just before the call that is failing in the SDK, insert the following code: import logging logging.basicConfig(level=logging.DEBUG)

taruntadikonda commented 5 months ago

download_file.txt

Attached the log file.

banibrata-de commented 5 months ago

Thanks for the logs, looks like we have a bug to fix for single file. However, right now don't have an ETA for this. Until then please use download all files or you may want to download a single file from Azure ML studio portal.

0xfabioo commented 4 months ago

I'm also affected by this bug.

taruntadikonda commented 4 months ago

@banibrata-de As you have closed the issue, is the bug fixed?

friggog commented 3 months ago

This bug still seems to be present ml_client.jobs.download(name=job_name) downloads all the outputs/logs of the job, but specifying output_name seems to not download anything

Any updates?

IliasAarab commented 2 months ago

Same issue observed from my side.

diondrapeck commented 2 months ago

Per @banibrata-de's comments above, we'll be closing this as not planned for now. Please use the workaround in the meantime and we will update this issue in the future once a fix is planned and executed.

0xfabioo commented 2 months ago

@diondrapeck So, you will close this ticket without a bug fix? How do you make sure this will ever get fixed?

diondrapeck commented 2 months ago

@0xfabioo A fix for it is not currently planned, so we won't leave an issue open for it. We have internal processes to keep track of items to consider work for in the future. It will be tracked there.

fepegar commented 4 weeks ago

Here's my workaround:

from pathlib import Path

from azure.ai.ml import MLClient
from azure.ai.ml._artifacts._artifact_utilities import download_artifact_from_aml_uri

def _build_aml_file_uri(run_id: str, aml_path: Path) -> str:
    base_uri = "azureml://datastores/workspaceartifactstore/ExperimentRun"
    uri = f"{base_uri}/dcid.{run_id}/{aml_path}"
    return uri

def download_file_from_run(
    ml_client: MLClient,
    run_id: str,
    aml_path: Path,
    out_dir: Path,
) -> Path:
    uri = _build_aml_file_uri(run_id, aml_path)
    download_artifact_from_aml_uri(
        uri=uri,
        destination=str(out_dir),
        datastore_operation=ml_client.jobs._datastore_operations,
    )
    return out_dir / aml_path.name