MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.28k stars 21.46k forks source link

Upload ML output to data storage #89949

Closed abhijit-z closed 2 years ago

abhijit-z commented 2 years ago

[Recommend content to push or upload the machine learning output data (csv or xlsx or any file) to data store (blob or data lake)]


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

YutongTie-MSFT commented 2 years ago

Thanks for the feedback! We are currently investigating and will update you shortly.

YutongTie-MSFT commented 2 years ago

Thanks for the feedback again! I have assigned the issue to the content author to investigate further and update the document as appropriate.

YutongTie-MSFT commented 2 years ago

@ynpandey Hello Yogi, could you please check on this feedback to see if we need to update the document? Thanks.

nibaccam commented 2 years ago

Hi @abhijit-z want to make sure I'm understanding the ask here. Are you looking for guidance on how to upload your output data file directly to data storage?

abhijit-z commented 2 years ago

Hi @nibaccam Thanks for the response! I am generating an output say, in .xlsx format and I want to write or store it directly in the blob storage or data lake.

nibaccam commented 2 years ago

Hi @abhijit-z you can upload data files to your storage with the upload_directory(). This method creates a FileDataset object and uploads the file directly to your storage container in one method.

For a code example, see how to Create a FileDataset

abhijit-z commented 2 years ago

Hi @nibaccam Dataset.File.upload_directory() takes one parameter as source directory but not a single file. I just want to upload a single file. How shall I go about that?

nibaccam commented 2 years ago

Hi @abhijit-z, you can leverage the pattern parameter of upload_directory() to indicate the single file you want to upload.

In the below example, I upload the iris.csv file that's in my train-dataset directory on my local machine. The file gets uploaded to a directory of the same name in the storage container that's connected to my workspace's default datastore

from azureml.core import Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
datastore = ws.get_default_datastore()
Dataset.File.upload_directory(src_dir='./train-dataset', target=DataPath(datastore,"train-dataset"), pattern= './train-dataset/iris.csv', overwrite=True, show_progress=True)
abhijit-z commented 2 years ago

Hi @nibaccam I tried that but it uploads 0 files. Here is part of the snippet of the progress after executing the code: Uploading an estimated of 0 files Uploaded 0 files

nibaccam commented 2 years ago

@abhijit-z It could be that something is missing in the pattern parameter. Could you please share what you have for the upload_directory() line of code please (like the below)?

Dataset.File.upload_directory(src_dir='./train-dataset', target=DataPath(datastore,"train-dataset"), pattern= './train-dataset/iris.csv', overwrite=True, show_progress=True)

abhijit-z commented 2 years ago

@nibaccam Please find the code below: from azureml.core import Workspace, Datastore, Dataset from azureml.data.datapath import DataPath datastore_name = 'ericsondatalake'

get existing workspace

workspace = Workspace.from_config()

retrieve an existing datastore in the workspace by name

datastore = Datastore.get(workspace, datastore_name)

Dataset.File.upload_directory( src_dir='/mnt/batch/tasks/shared/LS_root/mounts/clusters/abhijit2/code/Users/abhijit/Ericson Cross selling and Reorder', target=DataPath(datastore), pattern= '/mnt/batch/tasks/shared/LS_root/mounts/clusters/abhijit2/code/Users/abhijit/Ericson Cross selling and Reorder/try1_user_data.csv', overwrite=True, show_progress=True)

nibaccam commented 2 years ago

@abhijit-z the pattern parameter does an exact match filter on the path and file name. Since you didn't get an error saying that your directory doesn't exists, I'm thinking the path you're filtering on in the pattern parameter isn't an exact match.

Can you verify that the file name you're referring to matches what's in your source directory and that it has the correct file extension?

abhijit-z commented 2 years ago

Thanks @nibaccam I checked the name again. The letter was in caps which I missed. I executed the code with correct filename and then checked the blob storage, i can see the uploaded file. Could you please share some link that shows example for building end-to-end azure machine learning pipeline?

nibaccam commented 2 years ago

Great! Glad that worked out for you. Take a look at the following tutorials for step by step examples.

We will now proceed to close this thread.

please-close