Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.1k stars 2.52k forks source link

Can't pass DataRerence to DatabricksStep #1608

Open HakzA opened 3 years ago

HakzA commented 3 years ago

When passing a DataReference to my DatabricksStep:

day_files = DataReference(datastore=ds,
                          path_on_datastore='day_files',
                          data_reference_name='day_files',
                          )

python_script_name = "calculate_responsible_turnover.py"
source_directory = "./"

calculate_responsible_turnover_step = DatabricksStep(name="calculate_responsible_turnover",
                                                     python_script_params=['--data-path', day_files],
                                                     inputs=[day_files],
                                                     num_workers=1,
                                                     python_script_name=python_script_name,
                                                     source_directory=source_directory,
                                                     run_name='calculate_responsible_turnover',
                                                     compute_target=databricks_compute,
                                                     allow_reuse=False
                                                     )

I'm getting the error:

Traceback (most recent call last):
   File "/opt/project/collect_data/main.py", line 39, in <module>
     calculate_responsible_turnover_step = DatabricksStep(name="calculate_responsible_turnover",
   File "/opt/conda/lib/python3.8/site-packages/azureml/pipeline/steps/databricks_step.py", line 398, in __init__
     super(DatabricksStep, self).__init__(
   File "/opt/conda/lib/python3.8/site-packages/azureml/pipeline/core/_databricks_step_base.py", line 385, in __init__
     self._params["python_script_params"] = self._encode_string_params(python_script_params)
   File "/opt/conda/lib/python3.8/site-packages/azureml/pipeline/core/_databricks_step_base.py", line 578, in _encode_string_params
     final_params_list.append(value.replace("|", "|-"))
 AttributeError: 'DataReference' object has no attribute 'replace'

Seems like the DatabricksStep is expecting a string, but if I typecast my DataReference as a string, it becomes $AZUREML_DATAREFERENCE_day_files instead of the actual path. I'm running the following packages:

      - azureml-sdk==1.34.0
      - azureml-core==1.34.0
      - azureml-pipeline-core==1.34.0
      - azureml-datadrift==1.21.0
      - azureml-telemetry==1.23.0
      - azure-keyvault-secrets==4.2.0
      - azureml-pipeline-steps==1.34.0

Any suggestions on how to proceed with this problem?

LaurieCantaloube commented 3 years ago

Delete this line: python_script_params=['--data-path', day_files], ;) The DataReference must appear only in the inputs