Pipeline parameters used with DataPath and DataPathComputeBinding to specify side inputs of Parallel pipeline

[Enter feedback here] I'm following this example to create a PipelineParameters for my Parallel pipeline

from azureml.core.datastore import Datastore
from azureml.data.datapath import DataPath, DataPathComputeBinding
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineParameter

datastore = Datastore(workspace=workspace, name="workspaceblobstore")
datapath = DataPath(datastore=datastore, path_on_datastore='input_data')
data_path_pipeline_param = (PipelineParameter(name="input_data", default_value=datapath),
                           DataPathComputeBinding(mode='mount'))

train_step = PythonScriptStep(script_name="train.py",
                             arguments=["--input", data_path_pipeline_param],
                             inputs=[data_path_pipeline_param],
                             compute_target=compute_target,
                             source_directory=project_folder)

This is my code to create the pipeline with the parameters

path = DataPath(datastore=default_store, path_on_datastore='path')
input_param= (PipelineParameter(name="param_name", default_value=path), DataPathComputeBinding(mode='mount'))

parallel_run_config = ParallelRunConfig(
    source_directory=script_dir,
    entry_script='script.py',  # the user script to run against each input
    partition_keys=['key'],
    error_threshold=50,
    output_action='append_row',
    environment=environment,
    compute_target=compute_target, 
    node_count=2,
    run_invocation_timeout=1200
)

parallel_run_step = ParallelRunStep(
    name='test-batch-inference',
    inputs=[partition_input],
    side_inputs=[input1, input2, input_param],
    output=output_dir,
    parallel_run_config=parallel_run_config,
    arguments=['--input_param', input_param],
    allow_reuse=False
)

And it raised this error:

Exception: Step input must be of any type: (<class 'azureml.data.dataset_consumption_config.DatasetConsumptionConfig'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset'>, <class 'azureml.data.output_dataset_config.OutputFileDatasetConfig'>, <class 'azureml.data.output_dataset_config.OutputTabularDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkFileOutputDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkTabularOutputDatasetConfig'>), found <class 'tuple'>

I'm using azureml-core==1.40.0.post2, azureml-pipeline==1.40.0 It's seems like the sample code is not supported with these version? Before trying this datapath as pipeline parameter, I tried int type input and its just work fine

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 8e3ec7f7-25c2-8f63-331c-2eb62ffb73c7
Version Independent ID: 4e31dffb-12fd-85d9-a1a2-aa038017d075
Content: azureml.pipeline.core.graph.PipelineParameter class - Azure Machine Learning Python
Content Source: AzureML-Docset/stable/docs-ref-autogen/azureml-pipeline-core/azureml.pipeline.core.graph.PipelineParameter.yml
Service: machine-learning
Sub-service: core
GitHub Login: @DebFro
Microsoft Alias: debfro

It seems that PythonScriptStep take tuple defined as you did up, but for the ParallelRunStep you need to provide some of these types:

azureml.pipeline.core.graph.InputPortBinding
azureml.data.data_reference.DataReference
azureml.pipeline.core.PortDataReference
azureml.pipeline.core.builder.PipelineData
azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset
azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset
azureml.data.dataset_consumption_config.DatasetConsumptionConfig


datastore = Datastore(workspace=workspace, name="workspaceblobstore")
datapath = DataPath(datastore=datastore, path_on_datastore='input_data')
**data_path_pipeline_param = (PipelineParameter(name="input_data", default_value=datapath),
                           DataPathComputeBinding(mode='mount'))**

# Should be changed to this for ParallelRunStep
datastore = Datastore(workspace=workspace, name="workspaceblobstore")
datapath = DataPath(datastore=datastore, path_on_datastore='input_data')
input_data_parameter = PipelineParameter(name="input_data", default_value=datapath )
**input_data_consumption = DatasetConsumptionConfig("input_data_videos", input_data_parameter).as_mount()**

Azure / MachineLearningNotebooks

Pipeline parameters used with DataPath and DataPathComputeBinding to specify side inputs of Parallel pipeline #1801

Document Details