I am trying to create pipeline parameters for variable data access to a Synapse DataLakeGen2 datastore and invoke the pipeline with the 'Machine Learning Execute Pipeline' activity in Azure Synapse . According to the microsoft docs, datasets are the recommended way for interaction with the AzureDataLakeGen2Datastore class. I have verified this by trying to use DataPathComputeBinding with either the 'mount' or the 'download' mode, neither of which are supported for Gen2 datastores. So then I tried the DatasetConsumptionConfig class to pass the data to the compute target, which requires a dataset as a pipeline parameter. Unfortunately, the 'Machine Learning Execute Pipeline activity' only supports string or DataPath variables, so I could not find a way to pass a Dataset:
I then tried to use the DataPath as parameter input and convert it to a dataset, but the PipelineParameter class does not seem to provide any methods to retrieve the underlying DataPath:
datapath = DataPath(datastore=datastore, path_on_datastore=path)
data_path_pipeline_param = (PipelineParameter(name="input_data", default_value=datapath))
#does not work
dataset_parquet = Dataset.Tabular.from_parquet_files(data_path_pipeline_param)
ds_consumption = DatasetConsumptionConfig("input", dataset_parquet)
I am trying to create pipeline parameters for variable data access to a Synapse DataLakeGen2 datastore and invoke the pipeline with the 'Machine Learning Execute Pipeline' activity in Azure Synapse . According to the microsoft docs, datasets are the recommended way for interaction with the AzureDataLakeGen2Datastore class. I have verified this by trying to use DataPathComputeBinding with either the 'mount' or the 'download' mode, neither of which are supported for Gen2 datastores. So then I tried the DatasetConsumptionConfig class to pass the data to the compute target, which requires a dataset as a pipeline parameter. Unfortunately, the 'Machine Learning Execute Pipeline activity' only supports string or DataPath variables, so I could not find a way to pass a Dataset: I then tried to use the DataPath as parameter input and convert it to a dataset, but the PipelineParameter class does not seem to provide any methods to retrieve the underlying DataPath:
Is there a recommended way to do this?