Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.11k stars 2.52k forks source link

DatasetConsumptionConfig and PipelineParameter cannot be reused #1312

Open stanton119 opened 3 years ago

stanton119 commented 3 years ago

Following the tutorial to create ML datasets as pipeline parameters: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-dataset-and-pipelineparameter.ipynb

This requires the creation of PipelineParameter and then DatasetConsumptionConfig objects. If you have multiple steps which require the same dataset input, you would expect to be able to use the same PipelineParameter for all the steps. However when that PipelineParameter is used with a DatasetConsumptionConfig this doesn't work.

When building the pipeline we get the error: 'PipelineDataset' object has no attribute '_get_datapath'

Seems the object is altered when it is used, so when we use it a second time it breaks.

The work around at the moment is to create a new PipelineParameter for every step that uses the dataset, hence the number of pipeline parameters to configure grows very fast.

Tail end of error log (can't copy/paste text) image

SturgeonMi commented 3 years ago

Thanks for the feedback, Stanton! We created an internal task to investigate and track. Will update here.

Thanks, Xun

anirbansaha96 commented 3 years ago

Are there any updates on this? This is a major blocker for our Production environment.

anirbansaha96 commented 3 years ago

Not only for each step. Even if you want to submit the experiment again, this will create a problem.