This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.39k
stars
2.72k
forks
source link
Azure ML, Data output path auto-includes /INPUT_input_data/ #31734
Working on Azure ML Studio, using Python 3.10 SDK Kernel
Package Name:
azure.ai.ml
Package Version:
azure.ai.ml version: 1.8.0
Operating System:
Windows but using Azure ML studio which uses unix
Python Version:
3.10
Describe the bug
My source folder contains 4 csvs on "/input/data" and my destination folder output "/output/data"
Using command="cp -r ${{inputs.input_data}} ${{outputs.output_data}}"
I then get an additional folder in my path
"/output/data/INPUT_input_data"
rather than just the 4 csvs.
I tried various settings, but INPUT_input_data is always somehow imputed.
To Reproduce
Using the tutorial for accessing and writing data, I wanted to test some folder to folder copying operations.
For that I created an Azure storage container with hierarchical namespace enabled and created the folders:
input/data
output/data
from azure.ai.ml import command, Input, Output, MLClient
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.identity import DefaultAzureCredential
# Set your subscription, resource group and workspace name:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"
# connect to the AzureML workspace
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
input_path = "abfss://***@***.dfs.core.windows.net/input/data"
output_path = "abfss://***@***.dfs.core.windows.net/output/data"
data_type_in = AssetTypes.URI_FOLDER
data_type_out = AssetTypes.URI_FOLDER
input_mode = InputOutputModes.RO_MOUNT
output_mode = InputOutputModes.RW_MOUNT
# Set the input and output for the job:
inputs = {
"input_data": Input(type=data_type_in, path=input_path, mode=input_mode)
}
outputs = {
"output_data": Output(type=data_type_out,
path=output_path,
mode=output_mode,
)
}
# This command job copies the data to your default Datastore
job = command(
command="cp -r ${{inputs.input_data}} ${{outputs.output_data}}", # folder > folder
inputs=inputs,
outputs=outputs,
environment="azureml://registries/azureml/environments/sklearn-1.1/versions/4",
compute=compute_target,
)
# Submit the command
ml_client.jobs.create_or_update(job)
The azure storage container will then have a folder
"/output/data/INPUT_input_data" with its content of heart classifier
But I do not specify anywhere that it should use INPUT_input_data.
Working on Azure ML Studio, using Python 3.10 SDK Kernel
Describe the bug My source folder contains 4 csvs on "/input/data" and my destination folder output "/output/data" Using command="cp -r ${{inputs.input_data}} ${{outputs.output_data}}" I then get an additional folder in my path "/output/data/INPUT_input_data" rather than just the 4 csvs. I tried various settings, but INPUT_input_data is always somehow imputed.
To Reproduce
Using the tutorial for accessing and writing data, I wanted to test some folder to folder copying operations. For that I created an Azure storage container with hierarchical namespace enabled and created the folders: input/data output/data
I used the heartclassifier data, downloaded it from https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/batch/deploy-models/heart-classifier-mlflow/data and uploaded it to input/data
Then I used the access data tutorial: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?view=azureml-api-2&tabs=python#write-data-from-your-azure-machine-learning-job-to-azure-storage and then change it to a folder to folder scenario, as per the code below
Expected behavior I simply wanted 4 csvs in /output/data, with no extra folder being added. Just like in https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/batch/deploy-models/heart-classifier-mlflow/data
Additional context I tried about 20 different variants and also tried to get history outputted to a dummyfile but that turned out to be empty.