Closed abhijit-z closed 2 years ago
Thanks for the feedback! We are currently investigating and will update you shortly.
Thanks for the feedback again! I have assigned the issue to the content author to investigate further and update the document as appropriate.
@ynpandey Hello Yogi, could you please check on this feedback to see if we need to update the document? Thanks.
Hi @abhijit-z want to make sure I'm understanding the ask here. Are you looking for guidance on how to upload your output data file directly to data storage?
Hi @nibaccam Thanks for the response! I am generating an output say, in .xlsx format and I want to write or store it directly in the blob storage or data lake.
Hi @abhijit-z you can upload data files to your storage with the upload_directory(). This method creates a FileDataset object and uploads the file directly to your storage container in one method.
For a code example, see how to Create a FileDataset
Hi @nibaccam Dataset.File.upload_directory() takes one parameter as source directory but not a single file. I just want to upload a single file. How shall I go about that?
Hi @abhijit-z, you can leverage the pattern
parameter of upload_directory()
to indicate the single file you want to upload.
In the below example, I upload the iris.csv file that's in my train-dataset directory on my local machine. The file gets uploaded to a directory of the same name in the storage container that's connected to my workspace's default datastore
from azureml.core import Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath
datastore = ws.get_default_datastore()
Dataset.File.upload_directory(src_dir='./train-dataset', target=DataPath(datastore,"train-dataset"), pattern= './train-dataset/iris.csv', overwrite=True, show_progress=True)
Hi @nibaccam I tried that but it uploads 0 files. Here is part of the snippet of the progress after executing the code: Uploading an estimated of 0 files Uploaded 0 files
@abhijit-z It could be that something is missing in the pattern parameter. Could you please share what you have for the upload_directory() line of code please (like the below)?
Dataset.File.upload_directory(src_dir='./train-dataset', target=DataPath(datastore,"train-dataset"), pattern= './train-dataset/iris.csv', overwrite=True, show_progress=True)
@nibaccam Please find the code below: from azureml.core import Workspace, Datastore, Dataset from azureml.data.datapath import DataPath datastore_name = 'ericsondatalake'
workspace = Workspace.from_config()
datastore = Datastore.get(workspace, datastore_name)
Dataset.File.upload_directory( src_dir='/mnt/batch/tasks/shared/LS_root/mounts/clusters/abhijit2/code/Users/abhijit/Ericson Cross selling and Reorder', target=DataPath(datastore), pattern= '/mnt/batch/tasks/shared/LS_root/mounts/clusters/abhijit2/code/Users/abhijit/Ericson Cross selling and Reorder/try1_user_data.csv', overwrite=True, show_progress=True)
@abhijit-z the pattern parameter does an exact match filter on the path and file name. Since you didn't get an error saying that your directory doesn't exists, I'm thinking the path you're filtering on in the pattern parameter isn't an exact match.
Can you verify that the file name you're referring to matches what's in your source directory and that it has the correct file extension?
Thanks @nibaccam I checked the name again. The letter was in caps which I missed. I executed the code with correct filename and then checked the blob storage, i can see the uploaded file. Could you please share some link that shows example for building end-to-end azure machine learning pipeline?
Great! Glad that worked out for you. Take a look at the following tutorials for step by step examples.
We will now proceed to close this thread.
[Recommend content to push or upload the machine learning output data (csv or xlsx or any file) to data store (blob or data lake)]
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.