MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.12k stars 21.18k forks source link

[Azure ML SDK v2] Registering Data Asset from Job not working #98516

Open tomasvanpottelbergh opened 1 year ago

tomasvanpottelbergh commented 1 year ago

Following this example I tried to register the output of a job as a Data Asset, by using the azureml:<my_data>:<version> syntax for the path. The following minimal example shows what I'm trying to achieve:

from azure.ai.ml import command
from azure.ai.ml.entities import Data
from azure.ai.ml import Input, Output
from azure.ai.ml.constants import AssetTypes

my_job_outputs = {
    "prep_data": Output(type=AssetTypes.URI_FOLDER, path="azureml:test_output:1")
}

job = command(
    command="echo 'test' > ${{outputs.prep_data}}",
    outputs=my_job_outputs,
    environment="test-env@latest",
    compute="cpu-cluster",
)

# submit the command
returned_job = ml_client.create_or_update(job)

Unfortunately, this fails. The job submission works, but the Job is immediately going to the Failed status with the following error message: Invalid output uri azureml:test_output:1/ found for output prep_data of run , the list of supported uri formats are ["wasb://", "wasbs://", "adl://", "abfs", "abfss://", "azureml://"]

Could you clarify what the correct syntax is for registering a Data Asset from a Job output?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

AjayBathini-MSFT commented 1 year ago

@tomasvanpottelbergh Thanks for your feedback! We will investigate and update as appropriate.

RamanathanChinnappan-MSFT commented 1 year ago

@tomasvanpottelbergh

We have checked this issue, you have used for datastore path instead of data asset path that why you got this error. Kindly go through document and try again. image

Please Note, GitHub forum is dedicated for docs related issues. For any technical queries or clarifications, we encourage to utilise Microsoft Q & A platform. Kindly raise your query on Microsoft Q&A Platform

tomasvanpottelbergh commented 1 year ago

Thank you for your answer @RamanathanChinnappan-MSFT. Could you please clarify, because I'm using exactly the syntax for data assets, as suggested in the docs and in your screenshot. I will also raise this on the Q&A Platform, but I think this is an issue with the docs, since the provided syntax does not work.

RamanathanChinnappan-MSFT commented 1 year ago

@tomasvanpottelbergh

Thanks for your feedback! We have assigned the issue to author and will provide further updates.

tomasvanpottelbergh commented 1 year ago

Following the release of registering data assets directly from outputs, the documentation should be updated to reflect the accepted schema. See https://github.com/Azure/azure-sdk-for-python/issues/26618#issuecomment-1432598893 for more information.

xiaoyu-work commented 1 month ago

I got a same error here, following the same format azureml:<my_data>:<version>

ynpandey commented 1 month ago

Tagging @SturgeonMi

PesalaPavan commented 1 month ago

@SturgeonMi Please review it.

SturgeonMi commented 1 month ago

This document is not clear. The supported formats for input and output are not the same. If you want to specify the name and version of the output data asset, you can use name and version. But path parameter for output doesn't support "azureml::". The PR to modify the document is here: https://github.com/MicrosoftDocs/azure-docs-pr/pull/276903

xiaoyu-work commented 1 month ago

@SturgeonMi @ynpandey I also tried to set name and version without path. But I got error: "ModelAssetPathNotFoundInStorage: No blobs found in storage at model asset path: azureml/6211e6d8-b49f-463b-8016-5b1d74ef304d/AzureMLModels/". Also another question, by only setting name and version, how can I distinguish outputting outputs to Models or Data?

SturgeonMi commented 1 month ago

You can set up type of input/output as mentioned in: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?view=azureml-api-2&tabs=python https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-inputs-outputs-pipeline?view=azureml-api-2&tabs=cli#types-of-inputs-and-outputs